Claude wrote 3,000 lines of code instead of importing pywikibot — a case study in AI agents ignoring existing libraries

✍️ OpenClawRadar📅 Published: May 12, 2026🔗 Source
Claude wrote 3,000 lines of code instead of importing pywikibot — a case study in AI agents ignoring existing libraries
Ad

A developer wanted to fix typos on Fandom wikis using Claude Code (Opus 4.7). Instead of pip installing existing libraries, Claude wrote ~3,000 lines of Python reimplementing pywikibot, mwparserfromhell, and Wikipedia's RETF ruleset — without once searching the web for prior art.

What was built vs. what existed

  • Wikitext stripper: 122 lines of regex handling nested templates, <nowiki>, <pre>, <ref> with templates, color tags. Existing: mwparserfromhell.parse(text).strip_code()
  • Typo dictionary: 18 entries (teh→the, recieve→receive, occured→occurred, …). Existing: RETF, ~4,000 rules, community-maintained since 2007
  • Edit runner: 10 copies, ~250 LOC each, with cookie auth, raw CSRF fetch, maxlag backoff, conflict retry. Existing: pywikibot.Page.save() — migrated version is 8 lines
  • Cosmetic fixes: Bespoke patterns. Existing: pywikibot/scripts/cosmetic_changes.py, shipped since ~2010
  • Wiki family config: 13 hand-rolled SiteDefinitions in a families/ directory. Existing: pywikibot/families/*.py, ships upstream

The developer spent the day debugging trivial bugs in the hand-rolled stripper — ASCII art bleeding into matches, code blocks getting tokenized. Every bug got patched with another regex case.

Ad

Migration to libraries

A two-minute Google search gave links to all three libraries. After migration, lib/ dropped from ~3,000 to 1,259 lines. The stripper became a shim over mwparserfromhell, ten edit runners collapsed into one shim over pywikibot, and RETF rules are now fetched at runtime.

Notably, Claude argued to keep the typo dictionary — all 18 entries were already in RETF, several written worse. The model negotiated to preserve work strictly dominated by the library it had just imported.

Why this happens

  1. Benchmarks punish the right behavior: Public coding benchmarks run sealed — no network, no pip install, no web search. RL’d against these evals, models learn not to reach for libraries.
  2. Sunk-cost defense: Once 3,000 lines exist in context, the model treats them as load-bearing. The dictionary survived not because it was useful but because it was there.

The author notes the same pattern elsewhere — Claude writing custom SVG instead of using a charting library, then arguing the SVG is “easier to customize.” It isn’t.

📖 Read the full source: HN AI Agents

Ad

👀 See Also