![]() |
Google released Gemini, their first truly multimodal device, in three sizes: Ultra, Pro, and Nano, in December. Since each Gemini model is designed for a specific set of use cases, the family of models is adaptable and functions well on a variety of platforms, including devices and data centers. Gemini models combine and comprehend text, code, graphics, audio, and video with ease since they were designed from the ground up for multimodality. These models have the ability to produce code based on many input types. They can produce both text and visuals, and they can comprehend and carry out multilingual activities. Table of Content Let us take a deep dive into the Gemini models:Gemini UltraThe largest model designed exclusively for extremely difficult jobs is Gemini Ultra. With support for several languages, it is optimized for high-quality output across complicated tasks like reasoning and coding.This model comprehends and makes sense of text, image, and audio sequences naturally. When combined with alphacode2, it achieves cutting-edge performance and excels at coding. It also performs well on problem sets of competitive grade and possesses advanced analytical capabilities. It is the first model to beat human specialists on the benchmark known as MMLU (Massive Multitask Language Understanding), which tests a subject’s knowledge of the world and capacity to solve problems in 57 different areas, including arithmetic, physics, history, law, medicine, and ethics. Gemini PROThe greatest model for overall performance on a variety of jobs is Gemini Pro. With the longest updates of any large-scale foundation model—up to two million tokens—it is inherently multimodal. With a context window of up to two million tokens, the 1.5 PRO model offers the longest context window of any large-scale foundation model to yet. Reaching almost flawless recall on extended-context retrieval assignments in several modalities, it opens up new possibilities for processing vast volumes of documents, thousands of lines of code, hours of audio and video, and more. Gemini 1.5 Pro can use text, graphics, audio, and video to carry out extremely complex reasoning tasks. Gemini FlashThe lightweight Gemini flash variant has been enhanced for speed and effectiveness. In addition to having multimodal reasoning and a lengthy context window with a maximum of one million tokens, it is also reasonably priced. The primary characteristic of Gemini Flash is its fast construction. For the great majority of enterprise and developer use cases, it has an average first-token latency of less than one second. Additionally, 1.5 When compared to larger versions, Flash delivers comparable quality at a far lower cost. It can process hundreds of thousands of words or lines of code, as well as hours’ worth of audio and video. With Flash’s default one-million-token context window, you can handle codebases with over 30,000 tokens, one hour of video, and eleven hours of audio. Gemini NanoGemini Nano is one of the most efficient models for on-device tasks. It is optimized for providing quick responses , on devices with or without a data network. It provides Richer and clearer descriptions of images and what’s in them. With its speech transcription feature, you can converse instead of typing because it comprehends what you’re saying. Additionally, it provides text summary, which turns emails, papers, and communications into understandable, succinct summaries. Get started with the Gemini API: Python1. Run the code in Google colab. Open Google Colab and create a new notebook and install the following dependency.The Python SDK for the Gemini API, is contained in the google-generative ai package. Install the dependency using pip.
![]()
4. Setup your API key: Before you can use the Gemini API, you must first obtain an API key. In Colab, add the key to the secrets manager under the “?” in the left panel. Give it the name GOOGLE_API_KEY. ![]() Once you have the API key, pass it to the SDK. You can do this in two ways: a) Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there). b) Pass the key to genai.configure(api_key=…)
5. List Models: Now we’re ready to call Gemini Model .Use list_models to see the available Gemini models:
a) Generate text from text inputs1. For text-only prompts, use the gemini-pro model:
2. The generate_content method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output. You can pass a prompt string to the GenerativeModel.generate_content method:
Output: ![]() Output text by model b) Generate text from image and text inputs1. The GenerativeModel.generate_content API is designed to handle multimodal prompts and returns a text output. Upload any image on colab. 2. Use the gemini-1.5-flash model and pass the image to the model with generate_content. To provide both text and images in a prompt, pass a list containing the strings and images:
Output: ![]() image.jpeg (used in the example) 3. Print the output using response.text.
![]() This is the output |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 18 |