Google Gemini Models

Gemini 2.5 Pro

The latest version of Google’s flagship model was released in May 2025, with improvements in many areas.

Gemini 2.0

Gemini 2.0 is natively multimodal, meaning the models can natively produce images and audio outputs. It has also improved native tool use and model memory. By incorporating other best-in-class core Google products, such as Google Search and real-time information access, Gemini 2.0 enabled several products (specialised types of agentic workflows): Project Astra, Project Mariner, and Jules.

With the tagline ‘Enabling the agentic era,’ Gemini 2.0 naturally focuses on the capabilities critical for agentic use cases, such as multimodal reasoning, compositional function-calling, and complex instruction planning.

Gemini 2.0 Flash was released on 11th Dec 2024.

Gemini 1.5

Gemini 1.5, based on the Mixture of Experts (MoE) architecture, has a context window of between 128,000 to 1 million tokens, far larger than the current popular models such as OpenAI GPT-4 (128,000) and Gemini Pro 1.0 (32,000). This feature opens possibilities for many use cases that require a workaround or complicated architecture setup. For example, write a review and predict the reception of an hour-long new YouTube video, analyse a PDF file of over 1000 pages without needing a backend RAG infrastructure, or follow hundreds of pages of manuals and style guides for complex language tasks. Given that Gemini is a multimodal model, a large context window enables it to understand multimedia content the same way as text and produce outputs in mixed formats.

As of the authoring of this note, Gemini 1.5 should not be seen as a like-for-like competitor of GPT-4. Benchmarking against GPT-4 [@dasGeminiProVs2024] shows that Gemini 1.5 is more suitable for handling large datasets, dealing with multimodal use cases, and operating with large context windows. GTP-4 outperforms in smaller tasks that are more nuanced and require complex reasoning.

References:

Sundar Pichai (2024). Introducing Gemini 2.0: our new AI model for the agentic era. [online] Google. Available at: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/.

Liwen's Notes

Recent Notes

Python Programming Language

LLM Explainability & Interpretability

Accountability & Motivation

Reasoning Models

Gen AI: The Future Is Agentic

Hallmarks of Practical Agentic Systems

Google Agentspace

Model Context Protocol (MCP)

Magentic One

Manual - Obsidian

Explorer