Sarvam 30B & 105B Open-Source LLMs Released by Sarvam AI

Model specifications and architecture

Sarvam 30B and Sarvam 105B are reasoning models trained from scratch on large-scale, high-quality datasets curated in-house across pre-training, supervised fine-tuning, and reinforcement learning stages. Training was conducted entirely in India on compute provided under the IndiaAI mission.

Both models use a Mixture-of-Experts (MoE) Transformer backbone with sparse expert routing to scale parameter count without increasing compute per token. The architecture supports long-context inputs through rotary positional embeddings, RMSNorm-based stabilization, and attention designs optimized for efficient KV-cache usage during inference.

Sarvam 30B uses Grouped Query Attention (GQA) to reduce KV-cache memory while maintaining performance. Sarvam 105B extends the architecture with greater depth and Multi-head Latent Attention (MLA), a compressed attention formulation that reduces memory requirements for long-context inference. Both models use sparse expert feedforward layers with 128 experts but differ in expert capacity and routing configuration.

Training and data details

The 30B model was trained on 16T tokens, while the 105B model was trained on 12T tokens. Pre-training data spans code, general web data, specialized knowledge corpora, mathematics, and multilingual content with substantial allocation to the 10 most-spoken Indian languages.

Training used sigmoid-based routing scores rather than traditional softmax gating, which improves expert load balancing and reduces routing collapse. An expert-bias term stabilizes routing dynamics and encourages more uniform expert utilization across training steps.

Pre-training was conducted in three phases: long-horizon pre-training, mid-training, and a long-context extension phase. The 105B model achieved benchmark superiority over the 30B model early in training, suggesting efficient scaling behavior.

Performance and deployment

Sarvam 105B performs well on reasoning, programming, and agentic tasks across benchmarks. Sarvam 30B is optimized for real-time deployment with strong performance on real-world conversational use cases. Both models achieve state-of-the-art results on Indian language benchmarks, outperforming significantly larger models.

Sarvam 30B powers Samvaad, Sarvam's conversational agent platform. Sarvam 105B powers Indus, their AI assistant built for complex reasoning and agentic workflows.

Access and implementation

Weights can be downloaded from AI Kosh (30B, 105B) and Hugging Face (30B, 105B). For local inference with Transformers, vLLM, and SGLang, refer to the Hugging Face models page for sample implementations. Both models are accessible via Sarvam's API at their API dashboard.

📖 Read the full source: HN LLM Tools