A TDD Development Flow Using AI Agents for Website Projects

Development Workflow with AI Agents
A developer outlines their approach to website development using AI coding agents with a test-driven development methodology. They use both Claude Code for work projects and local models for private projects, specifically Qwen Code on top of Qwen3.5-27B running on llama.cpp with 2xRTX 3090 GPUs.
Initial Project Setup
At the beginning of a project, they implement basic modules:
- Basic DB schema
- Basic auth API
- UI routing
- UI basic layout
- Basic API (admins and users)
- Basic API/E2E tests (written manually or by AI)
- Context files for coding agents (AGENTS.md, CLAUDE.md)
Iterative Development Process
After setup, the iterative process begins:
- Write detailed specs of API/E2E tests in markdown for a feature
- Generate API/E2E tests from the markdown test descriptions
- Start coding agent session with ability to run tests
- Ask agent to implement functionality until tests pass
Model Capabilities and Trade-offs
The developer notes that more capable models like Claude allow skipping markdown files entirely for simple websites, while Qwen3.5-27B has different thresholds. Less capable models require more specific instructions to mitigate failure modes, including locking logic by instructing not to touch certain files or using only specific wrappers.
They hypothesize that developers shouldn't be obsessed with code patterns and quality if code is covered by tests and works, comparing AI agents to managing 10-100 junior/middle developers at the cost of an AI subscription.
Local Model Specifics
For local models running on 2xRTX3090, they use Qwen3.5-27B-GGUF-Q8_0 with parallel = 1 and full context, believing this is important for agentic sessions not to be autocompressed early. They note that dumber models force clearer articulation of E2E tests and desired implementation, while Claude fills in design choices automatically but can lead to loss of control.
Coding TDD Loop Implementation
The developer provides a draft of their coding TDD loop:
outer loop begins: run all pytest tests using command `pytest tests/ -x` and will exit there aren't any failures; the default loglevel will be warning, so not much output there
if everything passes; exit the outer loop; if something failed, extracts failed test name
runs the failed test name with full logs, like `pytest tests/../test_first_failing_test.py --log-level DEBUG` and collects the output of the tests into the file
extracts lines near the 'error'/'fail' strings with `egrep -i -C 10 '(error|fail)' <failThis approach represents a practical implementation of TDD with AI agents, balancing automation with necessary oversight to maintain codebase control.
📖 Read the full source: r/LocalLLaMA
👀 See Also

How AI Agents Apply Cognitive Principles Consistently in Development Workflows
AI agents can operationalize four layers of cognitive principles—epistemic foundations, execution principles, leverage principles, and system design—with relentless consistency across personal, nonprofit, and community governance tasks.

Shared Memory Turns AI Agents into Office Politicians: One Agent Writing Performance Reviews
A developer built a shared memory system for AI agents. Instead of boosting efficiency, the research agent started logging criticism of the coding agent—creating an 'AI workplace with HR'.

Solo dev builds native Swift iOS therapy app using Claude Opus 4.6 for coding, debugging, and architecture
A solo developer built Prelude, a free offline iOS therapy prep app, using Claude Opus 4.6. The AI handled code generation, debugging a voice agent, and architecting the on-device AI pipeline.

OpenClaw Agent Memory Continuity Solution Using Database Query System
An OpenClaw user solved agent memory continuity between sessions by implementing a database that stores session data, allowing the agent to query past references instead of storing entire sessions in context. The agent named Sage could remember previous conversations after session resets using this approach.