Leanstral: Open-Source Code Agent for Lean 4 and Formal Proof Engineering

What Leanstral Is
Leanstral is an open-source code agent specifically designed for Lean 4, a proof assistant capable of expressing complex mathematical objects and software specifications. Unlike existing proving systems that act as wrappers around large generalist models, Leanstral is trained for operating in realistic formal repositories with 6B active parameters.
Key Technical Details
The model uses a highly sparse architecture optimized for proof engineering tasks. It leverages parallel inference with Lean as a verifier, making it both performant and cost-efficient. Leanstral supports arbitrary MCPs through Mistral Vibe and was specifically trained to achieve maximal performance with the frequently used lean-lsp-mcp.
Performance Benchmarks
Leanstral was evaluated using FLTEval, a new evaluation suite focused on realistic proof engineering scenarios rather than isolated mathematical problems. The benchmarks compare completion of formal proofs and correct definition of new mathematical concepts in PRs to the FLT project.
Against Open-Source Models
- Leanstral-120B-A6B achieves a score of 26.3 with pass@2 (2 inference passes)
- GLM5-744B-A40B caps at approximately 16.6
- Kimi-K2.5-1T-32B caps at approximately 20.1
- Qwen3.5-397B-A17B requires 4 passes to reach 25.4
- Leanstral scales linearly, reaching 29.3 at pass@4 and 31.9 at pass@16
Against Claude Family
- Leanstral pass@2 (score 26.3) beats Sonnet (23.7) by 2.6 points
- Cost: Leanstral $36 vs. Sonnet $549
- Leanstral pass@16 reaches 31.9, beating Sonnet by 8 points
- Claude Opus 4.6 leads with 39.6 but costs $1,650 (92× Leanstral's cost)
- Haiku scores 23.0 at $184
Case Study Example
When presented with a real-world question from Proof Assistants Stack Exchange about a script that stopped compiling in Lean 4.29.0-rc6, Leanstral successfully built test code to recreate the failing environment. It diagnosed that a def T2 := List Bool was blocking the rw tactic from matching patterns due to definitional equality issues. The fix proposed was swapping def for abbrev since abbrev creates a transparent alias.
Availability
Leanstral weights are released under Apache 2.0 license, available in agent mode within Mistral Vibe, and through a free API endpoint. A tech report detailing the training approach will also be released.
📖 Read the full source: HN AI Agents
👀 See Also

KubeShark: A Kubernetes Skill for Claude Code and Codex to Catch Hallucinated YAML
KubeShark is a failure-mode-first Kubernetes skill for Claude Code and Codex that catches deprecated APIs, misconfigured probes, broken selectors, and other common AI-generated mistakes before they hit production.

Claude Code Undocumented Features: Hooks, Memory, YOLO Classifier & More
The Claude Code source reveals hidden configs: YOLO Classifier for auto-permission, hooks that rewrite commands, persistent agent memory, auto-mode rules in plain English, and dream loops.

Reasoning Guard: Proxy-Level Loop Detection for Local LLM Inference
A proxy-layer guard that detects and recovers from LLM reasoning loops using deterministic stream checks — token caps, n-gram repetition, and sentence fingerprinting — without model modifications.

Detrix MCP Server Adds Runtime Debugging to AI Coding Agents
Detrix is a free, open-source MCP server that enables MCP-compatible agents to observe live variables in running code without restarts or code changes. It supports Python, Go, and Rust applications running locally or in Docker.