Using /probe to catch AI hallucinations before writing code

✍️ OpenClawRadar📅 Published: April 15, 2026🔗 Source
Using /probe to catch AI hallucinations before writing code
Ad

What /probe does

The /probe technique forces AI-generated plans to output each asserted fact as a numbered CLAIM with an EXPECTED value. It then runs a command to probe against the real system and captures the delta between expected and actual results.

Real-world example from the source

A developer was trying to parse Claude's JSONL session files stored under ~/.claude/projects/.... Claude confidently described the format, but running /probe revealed four hallucinations:

  • Claim 1: AI said there were 2 top-level types (user, assistant). Reality: 7 types including queue-operation, file-history-snapshot, attachment, system, permission-mode, and summary.
  • Claim 2: AI said assistant content = text + tool_use. Reality: Missed thinking blocks, which are about a third of assistant output in extended thinking mode.
  • Claim 3: AI said user content is always an array. Reality: Polymorphic: string OR array.
  • Claim 4: AI said folder naming replaces / with -. Reality: Actually prepends dash, then replaces.

Without /probe, the jq filter would have errored on string-form user content, dumped thinking blocks as garbage, and missed 5 of 7 message types entirely.

How the probe works

The AI writes claims like "EXPECTED: 2 types" before running commands such as jq -r '.type' file.jsonl | sort -u. One probe output looked like:

CLAIM 1: JSONL has 2 top-level types (user, assistant)
EXPECTED: 2
COMMAND: jq -r '.type' *.jsonl | sort -u | wc -l
ACTUAL: 7
DELTA: +5 unknown types (queue-operation, file-history-snapshot, attachment, system, permission-mode, summary)
Ad

Key insights from the source

The claims worth probing are often the ones the AI is most confident about. When the AI hedges, you already know to check. When it flatly states X, you don't. High-confidence claims are where hallucinations hide.

Another benefit is that one probe becomes N permanent tests. The 7-type finding becomes a schema test that fails CI if a new type appears. The string-or-array finding becomes a property test that fuzzes both shapes. When the upstream format changes, the test fails, you re-probe, and the oracle updates.

Limitations and improvements

The probe only catches claims the AI thinks to make. Unknown unknowns stay invisible. Things that help:

  • Run jq 'keys' first to enumerate reality before generating claims
  • Dex Horthy's CRISPY pattern pushes the AI to surface its own gap list
  • GitHub's Spec Kit uses [NEEDS CLARIFICATION] markers in specs to force the AI to mark blind spots
  • Human scan of the claim list is also recommended

Contrast with traditional TDD

Traditional TDD writes tests based on what you THINK should happen. Probe-driven TDD writes tests based on what you spiked or VERIFIED happens. Mocks test your model of the system. The probe tests the system itself.

Source files

The developer shared the full /probe skill file in a gist with two files:

  • README.md: Longer writeup with the REPL-as-oracle angle and TDD contrast
  • probe-skill.md: The 7-step protocol loaded as a Claude Code skill

The pattern is just "claim table + real-system probe + capture the delta" and works with any REPL or CLI tool that can query the system you're about to code against.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also