Sarvam AI releases 30B and 105B open-source LLMs with Indian training infrastructure

Model specifications and architecture
Sarvam 30B and Sarvam 105B are reasoning models trained from scratch on large-scale, high-quality datasets curated in-house across pre-training, supervised fine-tuning, and reinforcement learning stages. Training was conducted entirely in India on compute provided under the IndiaAI mission.
Both models use a Mixture-of-Experts (MoE) Transformer backbone with sparse expert routing to scale parameter count without increasing compute per token. The architecture supports long-context inputs through rotary positional embeddings, RMSNorm-based stabilization, and attention designs optimized for efficient KV-cache usage during inference.
Sarvam 30B uses Grouped Query Attention (GQA) to reduce KV-cache memory while maintaining performance. Sarvam 105B extends the architecture with greater depth and Multi-head Latent Attention (MLA), a compressed attention formulation that reduces memory requirements for long-context inference. Both models use sparse expert feedforward layers with 128 experts but differ in expert capacity and routing configuration.
Training and data details
The 30B model was trained on 16T tokens, while the 105B model was trained on 12T tokens. Pre-training data spans code, general web data, specialized knowledge corpora, mathematics, and multilingual content with substantial allocation to the 10 most-spoken Indian languages.
Training used sigmoid-based routing scores rather than traditional softmax gating, which improves expert load balancing and reduces routing collapse. An expert-bias term stabilizes routing dynamics and encourages more uniform expert utilization across training steps.
Pre-training was conducted in three phases: long-horizon pre-training, mid-training, and a long-context extension phase. The 105B model achieved benchmark superiority over the 30B model early in training, suggesting efficient scaling behavior.
Performance and deployment
Sarvam 105B performs well on reasoning, programming, and agentic tasks across benchmarks. Sarvam 30B is optimized for real-time deployment with strong performance on real-world conversational use cases. Both models achieve state-of-the-art results on Indian language benchmarks, outperforming significantly larger models.
Sarvam 30B powers Samvaad, Sarvam's conversational agent platform. Sarvam 105B powers Indus, their AI assistant built for complex reasoning and agentic workflows.
Access and implementation
Weights can be downloaded from AI Kosh (30B, 105B) and Hugging Face (30B, 105B). For local inference with Transformers, vLLM, and SGLang, refer to the Hugging Face models page for sample implementations. Both models are accessible via Sarvam's API at their API dashboard.
📖 Read the full source: HN LLM Tools
👀 See Also

The AI Operator: A New Role for Agentic Workflows
Rish Gupta argues AI operators will be the key role in orgs within a year, combining technical skills (Python, LLM APIs, agent frameworks) with business process understanding to automate repetitive, high-impact tasks.

Claude Code v2.1.81 adds bare flag for scripting, fixes authentication and voice mode issues
Claude Code v2.1.81 introduces a --bare flag for scripted -p calls that skips hooks, LSP, and plugin sync, requiring ANTHROPIC_API_KEY or apiKeyHelper via --settings. The release also fixes multiple concurrent session authentication issues, voice mode error handling, and adds --channels permission relay.

Claude Code v2.1.86: Session headers, memory fixes, and token optimizations
Claude Code v2.1.86 adds X-Claude-Code-Session-Id headers for proxy aggregation, fixes memory growth in long sessions, and reduces token overhead when mentioning files with @. The release addresses 18 specific issues including config corruption on Windows and OAuth URL copying.

Claude Code v2.1.145: JSON Agent Listing, OTEL Span Fixes, Security Patch, and More
Claude Code v2.1.145 adds `claude agents --json` for scripting, fixes a permission-prompt bypass, improves OTEL spans, and more.