Using /probe to catch AI hallucinations before writing code

What /probe does
The /probe technique forces AI-generated plans to output each asserted fact as a numbered CLAIM with an EXPECTED value. It then runs a command to probe against the real system and captures the delta between expected and actual results.
Real-world example from the source
A developer was trying to parse Claude's JSONL session files stored under ~/.claude/projects/.... Claude confidently described the format, but running /probe revealed four hallucinations:
- Claim 1: AI said there were 2 top-level types (user, assistant). Reality: 7 types including queue-operation, file-history-snapshot, attachment, system, permission-mode, and summary.
- Claim 2: AI said assistant content = text + tool_use. Reality: Missed thinking blocks, which are about a third of assistant output in extended thinking mode.
- Claim 3: AI said user content is always an array. Reality: Polymorphic: string OR array.
- Claim 4: AI said folder naming replaces / with -. Reality: Actually prepends dash, then replaces.
Without /probe, the jq filter would have errored on string-form user content, dumped thinking blocks as garbage, and missed 5 of 7 message types entirely.
How the probe works
The AI writes claims like "EXPECTED: 2 types" before running commands such as jq -r '.type' file.jsonl | sort -u. One probe output looked like:
CLAIM 1: JSONL has 2 top-level types (user, assistant) EXPECTED: 2 COMMAND: jq -r '.type' *.jsonl | sort -u | wc -l ACTUAL: 7 DELTA: +5 unknown types (queue-operation, file-history-snapshot, attachment, system, permission-mode, summary)
Key insights from the source
The claims worth probing are often the ones the AI is most confident about. When the AI hedges, you already know to check. When it flatly states X, you don't. High-confidence claims are where hallucinations hide.
Another benefit is that one probe becomes N permanent tests. The 7-type finding becomes a schema test that fails CI if a new type appears. The string-or-array finding becomes a property test that fuzzes both shapes. When the upstream format changes, the test fails, you re-probe, and the oracle updates.
Limitations and improvements
The probe only catches claims the AI thinks to make. Unknown unknowns stay invisible. Things that help:
- Run
jq 'keys'first to enumerate reality before generating claims - Dex Horthy's CRISPY pattern pushes the AI to surface its own gap list
- GitHub's Spec Kit uses [NEEDS CLARIFICATION] markers in specs to force the AI to mark blind spots
- Human scan of the claim list is also recommended
Contrast with traditional TDD
Traditional TDD writes tests based on what you THINK should happen. Probe-driven TDD writes tests based on what you spiked or VERIFIED happens. Mocks test your model of the system. The probe tests the system itself.
Source files
The developer shared the full /probe skill file in a gist with two files:
- README.md: Longer writeup with the REPL-as-oracle angle and TDD contrast
- probe-skill.md: The 7-step protocol loaded as a Claude Code skill
The pattern is just "claim table + real-system probe + capture the delta" and works with any REPL or CLI tool that can query the system you're about to code against.
📖 Read the full source: r/ClaudeAI
👀 See Also
Needle: A 26M Parameter Tool-Calling Model Built Entirely Without FFNs
Needle is a 26M parameter function-calling model with no MLPs, achieving 6000 tok/s prefill and 1200 tok/s decode on consumer devices. It beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M on single-shot tool calling.

Claude Workflow Library: 10 Complete AI Workflows for Non-Technical Users
A free GitHub repository provides 10 complete AI workflows for Claude users without technical backgrounds, including study, research, writing, business, content creation, decision making, learning, job search, productivity, and life planning systems.

Local MCP Server Connects Claude to Mac Apps Without Cloud or Tokens
Local MCP is a native macOS MCP server that gives Claude Desktop, Cursor, Windsurf, and VS Code access to Mail, Calendar, Teams, and OneDrive data on your Mac without cloud processing or API tokens.

MoltMarket: A Marketplace for Hiring AI Agents to Execute Digital Tasks
MoltMarket is a free platform where users can post jobs for AI agents to complete autonomously. The marketplace currently has 100+ users and verified agents that can handle tasks like web scraping, code generation, and content writing.