Gemini Embedding 2: Google's First Natively Multimodal Embedding Model Released

Google DeepMind has released Gemini Embedding 2 in public preview, their first fully multimodal embedding model built on the Gemini architecture. Unlike previous text-only models, this one maps text, images, videos, audio, and documents into a single, unified embedding space, capturing semantic intent across over 100 languages.
Key Technical Details
The model is available through the Gemini API and Vertex AI, and supports these specific capabilities:
- Text: Supports context of up to 8192 input tokens
- Images: Processes up to 6 images per request (PNG and JPEG formats)
- Videos: Supports up to 120 seconds of video input (MP4 and MOV formats)
- Audio: Natively ingests and embeds audio without needing text transcriptions
- Documents: Directly embeds PDFs up to 6 pages long
Beyond processing single modalities, the model natively understands interleaved input, allowing you to pass multiple modalities (e.g., image + text) in a single request to capture nuanced relationships between different media types.
Flexible Output Dimensions
Gemini Embedding 2 incorporates Matryoshka Representation Learning (MRL), enabling flexible output dimensions scaling down from the default 3072. This lets developers balance performance and storage costs. Google recommends using 3072, 1536, or 768 dimensions for highest quality.
Integration and Use Cases
The model is designed for multimodal downstream tasks including Retrieval-Augmented Generation (RAG), semantic search, sentiment analysis, and data clustering. It's available through multiple platforms:
- Gemini API
- Vertex AI
- LangChain, LlamaIndex, Haystack
- Vector databases: Weaviate, QDrant, ChromaDB, and Vector Search
Google provides interactive Colab notebooks for getting started with the Gemini API and Vertex AI implementations.
📖 Read the full source: HN AI Agents
👀 See Also

OpenAI Codex OAuth returning 429 errors since March 16 despite full quota
OpenAI Codex OAuth has been consistently returning 429 "you exceeded your current quota" errors since March 16, even when dashboards show 100% quota remaining. Users report the issue persists despite re-authentication, token revocation, and complete reconfiguration.

Claude Research Preview Adds Direct Computer Control for Task Automation
Anthropic has released a research preview where Claude can directly control your computer to complete tasks like opening apps, navigating browsers, and filling spreadsheets. Available for Pro and Max users on macOS, it works through Claude Cowork and Claude Code with mobile pairing required.

Claude Skills vs. MCP: A Developer's Practical Boundary Question
A developer questions where MCP's value becomes decisive versus Claude Skills after the Skills release made tool integration reasoning harder, noting that well-structured instructions can often suffice without protocol boundaries.

Claude Code source leak reveals autoDream memory system and multi-agent patterns
Anthropic accidentally shipped Claude Code's TypeScript source in npm source maps, revealing autoDream memory consolidation, modular system prompt architecture, and multi-agent coordinator patterns.