Steelman R5: Fine-tuned 14B Model Outperforms Claude Opus on Ada Code Generation

✍️ OpenClawRadar📅 Published: March 13, 2026🔗 Source
Steelman R5: Fine-tuned 14B Model Outperforms Claude Opus on Ada Code Generation
Ad

Model and Training Details

The Steelman R5 model is a fine-tuned version of Qwen2.5-Coder-14B-Instruct specifically optimized for Ada code generation. Training used QLoRA 4-bit via Unsloth with TRL SFTTrainer on a dataset of 3,430 Ada/SPARK instruction pairs where every training example passes gnatmake -gnat2022 -gnatwa compilation.

Training configuration: LoRA rank 32, alpha 64, targeting q/k/v/o/gate/up/down projections. The model was fully retrained from base each round on accumulated dataset (adapter continuation caused catastrophic forgetting at R2). Training ran for 1 epoch with learning rate 2e-5, constant schedule, taking about 49 minutes per round on a rented H100. Five rounds total (R1–R5), with R2 discarded.

Benchmark Results

Custom Ada Compilation Benchmark (1,000 prompts, first-attempt clean compile):

  • Steelman R5 (14B): 68.6% compile rate
  • Claude Opus 4.6: 42.1% compile rate
  • Claude Sonnet 4.6: 37.2% compile rate
  • Qwen2.5-Coder-14B (base, untuned): ~35% compile rate
  • Claude Sonnet 4: 27.5% compile rate

MultiPL-E HumanEval-Ada (157 problems, pass@1):

  • Steelman R5: 47.1% pass@1, 74.5% compile rate
  • Qwen2.5-Coder-14B (base): 34.4% pass@1, 51.0% compile rate

These are the first published Ada pass@1 results on HumanEval for any open model.

Ad

Usage and Availability

Run the model with: ollama run hf.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF

The GGUF version fits in 12GB VRAM with Q4_K_M quantization.

Limitations

  • Compilation ≠ correctness: 68.6% compiles, but only 47.1% produces correct output on HumanEval
  • Error-fix capability is weak (5.1%) - don't expect it to debug Ada code
  • SPARK contracts compile but aren't verified with gnatprove
  • Synthetically generated training data - no human Ada developers wrote these examples
  • 14B model size means it may miss things a larger model would catch

Resources

  • Model: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1
  • GGUF: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF
  • Dataset: https://huggingface.co/datasets/the-clanker-lover/steelman-sft-ada

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also