Steelman R5: Fine-tuned 14B Model Outperforms Claude Opus on Ada Code Generation

Model and Training Details
The Steelman R5 model is a fine-tuned version of Qwen2.5-Coder-14B-Instruct specifically optimized for Ada code generation. Training used QLoRA 4-bit via Unsloth with TRL SFTTrainer on a dataset of 3,430 Ada/SPARK instruction pairs where every training example passes gnatmake -gnat2022 -gnatwa compilation.
Training configuration: LoRA rank 32, alpha 64, targeting q/k/v/o/gate/up/down projections. The model was fully retrained from base each round on accumulated dataset (adapter continuation caused catastrophic forgetting at R2). Training ran for 1 epoch with learning rate 2e-5, constant schedule, taking about 49 minutes per round on a rented H100. Five rounds total (R1–R5), with R2 discarded.
Benchmark Results
Custom Ada Compilation Benchmark (1,000 prompts, first-attempt clean compile):
- Steelman R5 (14B): 68.6% compile rate
- Claude Opus 4.6: 42.1% compile rate
- Claude Sonnet 4.6: 37.2% compile rate
- Qwen2.5-Coder-14B (base, untuned): ~35% compile rate
- Claude Sonnet 4: 27.5% compile rate
MultiPL-E HumanEval-Ada (157 problems, pass@1):
- Steelman R5: 47.1% pass@1, 74.5% compile rate
- Qwen2.5-Coder-14B (base): 34.4% pass@1, 51.0% compile rate
These are the first published Ada pass@1 results on HumanEval for any open model.
Usage and Availability
Run the model with: ollama run hf.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF
The GGUF version fits in 12GB VRAM with Q4_K_M quantization.
Limitations
- Compilation ≠ correctness: 68.6% compiles, but only 47.1% produces correct output on HumanEval
- Error-fix capability is weak (5.1%) - don't expect it to debug Ada code
- SPARK contracts compile but aren't verified with gnatprove
- Synthetically generated training data - no human Ada developers wrote these examples
- 14B model size means it may miss things a larger model would catch
Resources
- Model: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1
- GGUF: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF
- Dataset: https://huggingface.co/datasets/the-clanker-lover/steelman-sft-ada
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenMind adds visual mind map interface to OpenClaw installations
OpenMind is an open-source tool that transforms OpenClaw installations into interactive, live-editable mind maps with real-time memory visualization, hot-swappable logic, and full-text search across all nodes.

RalphTerm: ralph-style loop for Claude Code with cross-review sessions from different agents
RalphTerm is an open-source Rust CLI that runs a ralph-style outer loop around Claude Code: it takes a markdown plan, executes tasks in fresh interactive sessions, and runs cross-review with a different model (e.g., Codex) in separate fresh sessions, feeding issues back into new implementer sessions.

Argus: A VS Code Extension to Debug Claude Code Session Costs and Behavior
A developer built Argus, a VS Code extension that parses Claude Code JSONL transcripts into a real-time timeline with per-step token/cost breakdown, cache hit ratio, and flagging of retry loops, duplicate reads, and context pressure.

Caliber: Local CLI tool generates AI coding assistant configs from your repo
Caliber is a local-first CLI tool that scans repositories in languages like TypeScript, Python, Go, and Rust, then generates prompt and configuration files for AI coding assistants including Claude Code, Cursor, and Codex. It runs entirely on your machine with your own keys, has 13k npm installs, and is open source under MIT license.