CC-Canary: Detect Regressions in Claude Code with Local JSONL Analysis

CC-Canary is a drift detection tool for Claude Code, packaged as two installable Agent Skills. It scans the JSONL session logs that Claude Code already writes to ~/.claude/projects/, detects whether the model has been drifting on your own work, and produces a shareable forensic report. No network, no account, no telemetry, no background daemon — runs on data already on your disk. Status: 0.x / pre-alpha.
Installation
Install via npx skills:
npx skills add delta-hq/cc-canary
Or install individual skills:
npx skills add delta-hq/cc-canary --skill cc-canary npx skills add delta-hq/cc-canary --skill cc-canary-html
Requirements: Python 3.8+ on PATH. macOS/Linux/WSL for auto-open of HTML report (falls back to printing path).
Usage
From a Claude Code session:
/cc-canary 60d /cc-canary-html 30d
The window defaults to 60 days; accepts 7d, 14d, 30d, 60d, 90d, 180d.
What You Get
- Verdict — HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE
- Headline metrics table — pre vs post comparison with green/yellow/red bands
- Weekly trend bars — cost (USD, verified against ccusage), read:edit ratio, reasoning loops, tokens/turn
- Cross-version comparison — same user, different model versions, controlling for task mix
- Auto-detected inflection date — composite health-score break
- Findings with model-side / user-side / ambiguous classification
- Appendices — hour-of-day thinking depth, word-frequency shift, three-period thinking-visibility transition, per-turn behavior rates
Metrics Tracked
- Read:Edit ratio — file reads per edit; proxy for investigation thoroughness
- Write share of mutations — Write / (Edit + Write); high share = rewriting instead of surgical edits
- Reasoning loops / 1K tool calls — phrases like "let me try again", "oh wait", "actually"
- Frustration rate — rate of frustration words in your prompts
- Thinking redaction rate — fraction of thinking blocks redacted vs visible
- Mean thinking length — reasoning-depth proxy
- API turns per user turn — API calls per user message
- Tokens per user turn — total token volume per user message
Plus appendices for premature stopping, self-admitted errors, shortcut vocabulary, user interrupts, etc.
How It Works
- Scan — Python script (stdlib only) walks
~/.claude/projects/**/*.jsonl, filters by window, excludes subagent sessions. - Dedupe — Assistant messages deduped on (message.id, requestId) because Claude Code writes the same message into multiple JSONLs when sessions are resumed or branched.
- Aggregate — Per-session metrics: tool-mix, read:edit ratio, reasoning-loop phrases, self-admitted errors, premature stops, interrupts, token usage, cost (current Claude 4.x rates), hour-of-day thinking depth.
- Detect inflection — Composite health score per day; argmax of |before − after| over candidate dates with 0.75σ floor. Falls back to median-timestamp split if no break clears.
- Pre-render report — Script writes markdown/HTML skeleton with every table and bar chart filled in. ~20 narrative slots left for Claude to fill.
- Fill & save — Claude reads skeleton, writes narrative, saves final file. Total runtime: ~2.5s script + 10–20s Claude narrative.
📖 Read the full source: HN AI Agents
👀 See Also

IUM: MCP Symbol Indexer Cuts AI Agent Token Usage by 15.9x vs grep
IUM indexes codebases into an SQLite matrix of symbol events, exposing exact file:line coordinates, call graph tracing, and semantic search via MCP. Benchmarked against DataFusion (1,538 files) showing 15.9x fewer tokens than grep for equivalent queries.

Skill Studio: Open-Source Desktop App for Managing Claude AI Agent Skills
Skill Studio is a free, open-source macOS desktop app that lets developers browse community skill repositories, preview documentation with markdown rendering, and install skills with one-click commands like npx skills add.

Building a Local Voice-to-Text macOS App with Claude Code: Vext Case Study
A developer spent 3 months building Vext, a macOS voice-to-text app using Whisper on Apple Neural Engine. Claude Code helped with Rust/Swift FFI, Core ML optimization, and hotkey architecture. The app runs 100% offline, transcribes 60s audio in ~400ms.

Context Mode: An MCP Server That Compresses Tool Outputs for Claude Code
Context Mode is an MCP server that sits between Claude Code and tool outputs, processing them in sandboxes and returning only summaries. It reduces 315 KB of MCP output to 5.4 KB, extending session time before slowdown from ~30 minutes to ~3 hours.