OpenClaw WhatsApp Auto-Reply May Skip Media Understanding in 2026.4.2

Issue Overview
A user encountered a problem where OpenClaw's WhatsApp integration failed to transcribe voice notes despite correct configuration. The issue occurs specifically in the WhatsApp auto-reply flow in OpenClaw version 2026.4.2.
Problem Details
The user's setup included:
- WhatsApp inbound messages with valid MediaPath and MediaType
- Audio files being stored correctly as .ogg files
tools.media.audioenabled in configuration- An external transcription backend (Groq STT) for speech-to-text
Despite everything appearing correct, the agent received <media:audio> placeholders instead of transcripts. The transcription process never triggered.
Root Cause
After tracing the flow, the user discovered that the WhatsApp auto-reply path doesn't always invoke the standard media understanding pipeline before dispatching messages to the agent. This means:
tools.media.audiois never executed- CLI or external backends (like Groq STT) never run
- The agent only sees the
<media:audio>placeholder
This issue is particularly noticeable when using non-native audio models, as those won't auto-handle audio implicitly.
Solution
The fix involves forcing a call to the media understanding step before the reply is dispatched to the agent. The user patched the WhatsApp inbound auto-reply flow to:
- Build the WhatsApp inbound context
- Explicitly run the same media understanding logic used in the standard reply pipeline
- Continue with normal agent dispatch
After implementing this fix:
- Audio gets picked up correctly
- The CLI (Groq STT in this case) executes
- The transcript is injected into the message
- The agent receives text instead of
<media:audio>
Who This Affects
This issue impacts users who rely on CLI-based transcription, external APIs, or any non-native audio model. These setups depend entirely on media understanding being triggered, and if that step is skipped, nothing downstream will work even with correct configuration.
Key Takeaway
If you're experiencing issues where audio is received and stored correctly, tools.media.audio is enabled, but transcription never happens, check whether your WhatsApp auto-reply path is actually calling the media understanding pipeline before agent dispatch.
📖 Read the full source: r/openclaw
👀 See Also

Claude User Shares 'Don't Manage My Feelings' Prompt for Direct Technical Feedback
A Claude user recommends setting a specific prompt in user preferences to reduce validation preamble and get more direct technical feedback. The prompt tells Claude to skip diplomatic phrasing and provide straightforward criticism on technical and creative work.

5 Patterns for Getting Better Results from Claude (Non-Technical Users)
Practical scaffolding, example-based prompting, negative instructions, persistent context, and source grounding — five patterns that consistently improve output quality from Claude, backed by six months of field experience.

Claude Prompt for Visualizing Thinking Structure: Intent, Reality, Gap
A Reddit user shares a 100-word prompt for Claude that asks the AI to notice and reflect back the structural patterns in conversation—categorized as Intent (what you WANT), Reality (what IS), and Gap (what's UNRESOLVED)—rather than the content itself.

Using project narratives to manage memory in large OpenClaw projects
A developer shares a process where after each major milestone, they spawn a separate OpenClaw worker to analyze the codebase and write a 'project narrative' document, which helps identify broken pipelines, redundancies, and missing pieces that the main worker might overlook.