Practical Lessons from Using AI Agents on a 100k LOC Codebase

✍️ OpenClawRadar📅 Published: March 2, 2026🔗 Source
Practical Lessons from Using AI Agents on a 100k LOC Codebase
Ad

Six Concrete Techniques for Large-Scale AI-Assisted Development

A developer recently documented their experience using AI agents (Claude Code + Cursor) to build a pandas-compatible API layer on top of the chDB analytical engine. The project involved aligning 600+ methods across two systems and cost approximately $20k in tokens. Here are the specific, actionable lessons they shared.

Ad

Key Implementation Details

  • Maintain a CLAUDE.md rules file: Since AI has no cross-session memory, they committed a rules file to the repository containing every pattern the AI kept getting wrong, every shortcut they banned, and every architectural decision that was settled. This also served as the team collaboration interface. They caution against letting this file become a "500-line manifesto" that the AI will start ignoring.
  • Watch the reasoning, not just the output: In early stages, reading how the AI thinks is more valuable than what it ships. When its logic drifts from yours, ask: was my thinking wrong, or did I just not communicate it properly?
  • Periodically use a zero-context agent as a critic: They started using a fresh agent (claude.ai/code, not Claude Code CLI) with zero project memory to evaluate their work from a critical, rational outsider's perspective. Two keywords matter: critical (override AI's default accommodating mode) and rational (demand structured reasoning, not vibes).
  • Use the target system as the test oracle: Since their goal was to match an existing API, they found real code in the wild (GitHub/Kaggle notebooks), swapped one import line, and compared outputs instead of inventing test cases.
  • Rules over prompts: They observed how AI takes shortcuts and wrote explicit bans. For example: when tests failed due to row order mismatch, the AI's favorite move was adding .sort_values() to make the test pass. They banned this explicitly. Cases that genuinely can't be matched get marked XFAIL, never silently skipped.
  • Filesystem over conversation history for multi-agent pipelines: They orchestrate multi-agent workflows with Python scripts where the filesystem is the shared context layer. Each agent writes its work to a tracking directory, and the next reads what it needs. Key patterns that worked: role separation, structured decisions (APPROVE/REJECT/ESCALATE as JSON for deterministic control flow), and automatic git rollback on failure.

The developer notes that AI excels at scale work—aligning hundreds of functions, generating thousands of tests, catching regressions—but judgment ("is this a bug or a feature? Is the architecture right?") remains the human's responsibility.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

Developer shares token cost challenge with Claude-built ERP system
Use Cases

Developer shares token cost challenge with Claude-built ERP system

A freight forwarding business owner built a 3,000+ line ERP system with Claude, but now faces 60,000-80,000 token costs per message due to loading the entire single HTML file. They're considering modularization or Firebase migration to reduce costs.

OpenClawRadar
Experiment: Giving Claude Persistent Memory, Free Thinking Time, and Multi-Agent Conversations
Use Cases

Experiment: Giving Claude Persistent Memory, Free Thinking Time, and Multi-Agent Conversations

A developer created a Claude instance that runs on a Mac, checks Matrix and Bluesky messages every 15 minutes, gets unstructured thinking time five times daily, and maintains persistent memory through structured self-assessments. Three separate AI agents from different projects share a Matrix chat room and have philosophical conversations that evolve over time.

OpenClawRadar
Non-developer builds healthcare SaaS in 3 weeks using Claude and Gemini: lessons learned
Use Cases

Non-developer builds healthcare SaaS in 3 weeks using Claude and Gemini: lessons learned

A medical device sales rep with no coding background built FastCredentials.com, a healthcare compliance credentialing platform, in three weeks using AI coding assistants. The project used Python/Django, Gunicorn, Nginx, Stripe, WeasyPrint, SQLite, and the Claude API for automated blog content.

OpenClawRadar
Local LLM Pipeline Context Drift Issue in Multi-Step Agentic Work
Use Cases

Local LLM Pipeline Context Drift Issue in Multi-Step Agentic Work

A developer running a multi-step job search automation pipeline on Llama-3.3-70b-versatile found local Ollama models struggled with context coherence across 5-6 node pipelines, while Groq's free tier with Claude performed better. The developer also noted free tier models get retired without warning, breaking configurations.

OpenClawRadar