Fine-tuned Qwen2.5-7B to 96% of Claude Haiku with $3 and Zero Human Labelers

A developer fine-tuned Qwen2.5-7B to achieve 96% of Claude Haiku's composite performance on a domain-specific decision-reasoning task — spending only ~$3 in API calls and using zero human labelers. The method, called DV-DPO (Decision-Validated Direct Preference Optimization), autonomously generates training signal by running a multi-voice adversarial council.
How DV-DPO Works
The pipeline runs a 3-voice council on each decision question, producing a synthesis. Then the two losing voices cross-examine the synthesis. If the synthesis is revised under this adversarial pressure, a DPO pair is formed: the post-revision version is the chosen response, and the pre-revision version is the rejected response. If the synthesis holds — no pair is created. This ensures only genuine reasoning errors produce training signal, not format preferences or sampling variance.
Results
- 1,040 training pairs generated total (~$3 at Haiku rates)
- Head-to-head vs Claude Haiku: Format 100%, Commits 100%, Context 89%, Composite 96%
- Latency: 11s on T4 GPU (4-bit quantized) vs Haiku's 3s
- Adversarial failure rate: 2% on 96 targeted questions
Autonomous Improvement Loop
The system now runs an automated cycle: failure_detector → auto_red_team → DPO pairs → retrain → redeploy → eval. Version 5 pairs are accumulating. The fine-tuned model is available as a GGUF file ready for Ollama.
Who This Is For
Developers building domain-specific reasoning agents who want to move from pay-per-call APIs to a local fine-tuned model without expensive human annotation.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code v2.1.132: SIGINT Graceful Shutdown, MCP Fixes, and Terminal Handling Overhaul
Claude Code v2.1.132 fixes graceful shutdown on external SIGINT, adds CLAUDE_CODE_SESSION_ID and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN env vars, patches MCP memory leaks and tool listing retries, and resolves dozens of terminal edge cases across IDE terminals.

Google: 75% of New Code Is AI-Generated, Code Migration 6x Faster with Agents
Google reports 75% of new code is AI-generated, up from 25% in 2024. A complex code migration completed 6x faster using Gemini agents. Engineers in some orgs have AI usage goals tied to performance reviews.

The Hidden Financial Bubble in AI Infrastructure – Key Takeaways
A critical analysis of the AI infrastructure spending boom, warning of an unsustainable bubble similar to past tech crashes. The PDF argues that massive capital expenditure on GPUs and data centers far exceeds actual revenue generation.

Google to Provide AI Agents to Pentagon for Unclassified Work
Google will provide AI agents to the Pentagon for unclassified work, according to a Bloomberg report. The article has generated discussion on Hacker News with 61 points and 52 comments.