Gemini 1.5
Gemini 1.5, based on the Mixture of Experts (MoE) architecture, has a context window of between 128,000 to 1 million tokens, far larger than the current popular models such as OpenAI GPT-4 (128,000) and Gemini Pro 1.0 (32,000). This feature opens possibilities for many use cases that require a workaround or complicated architecture setup. For example, write a review and predict the reception of an hour-long new YouTube video, analyse a PDF file of over 1000 pages without needing a backend RAG infrastructure, or follow hundreds of pages of manuals and style guides for complex language tasks. Given that Gemini is a multimodal model, a large context window enables it to understand multimedia content the same way as text and produce outputs in mixed formats.
As of the authoring of this note, Gemini 1.5 should not be seen as a like-for-like competitor of GPT-4. Benchmarking against GPT-4 [@dasGeminiProVs2024] shows that Gemini 1.5 is more suitable for handling large datasets, dealing with multimodal use cases, and operating with large context windows. GTP-4 outperforms in smaller tasks that are more nuanced and require complex reasoning.