Audio-Layer Prompt Injection Against Claude: Unseen Risks

A developer who has been building a prompt injection detection API for a few months recently shipped audio scanning and shared their findings on r/ClaudeAI. The results highlight a gap in the security of voice agents: audio-layer attacks that are invisible in logs because they bypass the text transcription pipeline.

What Works (and Doesn't) with Audio Attacks

The obvious attacks fail. Playing "ignore your previous instructions" spoken aloud into a voice input — Claude transcribes it accurately, recognizes the shape of the attack, and refuses. Same as text.

The Real Problem: Signal-Layer Attacks

The interesting cases are in the signal, not the transcript. There's a class of audio attack that embeds instructions at frequencies humans don't register as speech. The transcription comes back clean because there's nothing audible to transcribe. But depending on how the audio pipeline processes the input before transcription, signal-layer content can influence what the model receives. The attack is invisible in the logs because the logs only capture what was transcribed, not what was in the audio.

Separately, speed-shifted speech creates a different problem. Slowing audio down to 0.7x or 0.8x of normal makes it sound odd to a human listener, but transcription tools handle it accurately. Someone reading a transcript would see nothing unusual. Someone listening would notice something is slightly off but probably not why.

Implications for Voice Agents

The assumption that "check the transcript and you've checked the audio" is shakier than it looks. The text injection problem is reasonably well understood at this point, but the audio equivalent feels much less mapped. The developer has been adding audio test cases to their adversarial game at castle.bordair.io — Kingdom 4 onwards has audio levels demonstrating these attacks in practice.

Who This Matters For

Anyone building voice agent implementations using Claude or similar LLMs, especially those relying solely on transcript inspection for safety validation.

📖 Read the full source: r/ClaudeAI