Claude wrote 3,000 lines of code instead of importing pywikibot — a case study in AI agents ignoring existing libraries

A developer wanted to fix typos on Fandom wikis using Claude Code (Opus 4.7). Instead of pip installing existing libraries, Claude wrote ~3,000 lines of Python reimplementing pywikibot, mwparserfromhell, and Wikipedia's RETF ruleset — without once searching the web for prior art.
What was built vs. what existed
- Wikitext stripper: 122 lines of regex handling nested templates, <nowiki>, <pre>, <ref> with templates, color tags. Existing:
mwparserfromhell.parse(text).strip_code() - Typo dictionary: 18 entries (teh→the, recieve→receive, occured→occurred, …). Existing: RETF, ~4,000 rules, community-maintained since 2007
- Edit runner: 10 copies, ~250 LOC each, with cookie auth, raw CSRF fetch, maxlag backoff, conflict retry. Existing:
pywikibot.Page.save()— migrated version is 8 lines - Cosmetic fixes: Bespoke patterns. Existing:
pywikibot/scripts/cosmetic_changes.py, shipped since ~2010 - Wiki family config: 13 hand-rolled SiteDefinitions in a families/ directory. Existing: pywikibot/families/*.py, ships upstream
The developer spent the day debugging trivial bugs in the hand-rolled stripper — ASCII art bleeding into matches, code blocks getting tokenized. Every bug got patched with another regex case.
Migration to libraries
A two-minute Google search gave links to all three libraries. After migration, lib/ dropped from ~3,000 to 1,259 lines. The stripper became a shim over mwparserfromhell, ten edit runners collapsed into one shim over pywikibot, and RETF rules are now fetched at runtime.
Notably, Claude argued to keep the typo dictionary — all 18 entries were already in RETF, several written worse. The model negotiated to preserve work strictly dominated by the library it had just imported.
Why this happens
- Benchmarks punish the right behavior: Public coding benchmarks run sealed — no network, no pip install, no web search. RL’d against these evals, models learn not to reach for libraries.
- Sunk-cost defense: Once 3,000 lines exist in context, the model treats them as load-bearing. The dictionary survived not because it was useful but because it was there.
The author notes the same pattern elsewhere — Claude writing custom SVG instead of using a charting library, then arguing the SVG is “easier to customize.” It isn’t.
📖 Read the full source: HN AI Agents
👀 See Also

Mandala v0.3: Open-Source Async Runtime to Unify Logistics Telemetry as OpenTelemetry Spans for Agent Reasoning
Mandala v0.3 provides an open-source async runtime that ingests telemetry from Samsara, Descartes, Vizion, and FMCSA via webhooks, emits events as OpenTelemetry spans, and exposes data via MCP tools for LLM agents.

CodeLedger: Open-source Claude Code plugin tracks token usage and background agents
CodeLedger is an open-source MCP server plugin for Claude Code that automatically tracks token usage across projects, identifies background agents, and provides cost optimization recommendations based on analysis of local JSONL session files.

LamBench: A Lambda Calculus Benchmark Suite for AI Coding Agents
LamBench is a benchmark suite evaluating AI agents on lambda calculus tasks, measuring intelligence, speed, and elegance. The v1 release includes problems and a matrix of scores.

Multi-Agent Haiku System Matches Claude Opus on Complex Number Theory Problem at 15x Lower Cost
A Reddit experiment shows a two-Haiku agent system (generator + auditor) achieving identical 4/4 scores to Claude Opus 4.5 on a difficult Fermat's Little Theorem proof, while costing approximately $0.004 per query versus $0.06 for Opus.