Anam Cara-3: Advancements in Interactive AI Avatars

✍️ OpenClawRadar📅 Published: February 17, 2026🔗 Source
Anam Cara-3: Advancements in Interactive AI Avatars
Ad

Anam has released its latest model, cara-3, designed to create interactive avatars. The avatar utilizes a two-stage pipeline where a diffusion transformer converts audio into motion embeddings (including head position, eye gaze, lip shape, and expression). These embeddings are then applied to a reference image to generate video frames, allowing for animation of any face without the need for retraining.

Notably, Cara-3 can achieve a time-to-first-frame of approximately 70ms on an H200, which supports many concurrent avatar sessions on a single GPU. This speed is partly due to the novel flow matching variant used for audio-to-motion transformation, as conventional techniques proved unstable.

An independent blind evaluation showed that Cara-3 outperformed competitors like HeyGen, Tavus, and D-ID, scoring 24% higher on average across various metrics. Responsiveness, as evidenced by a Spearman correlation coefficient of 0.697, is shown to impact user experience more than visual quality (0.473).

Anam has also open-sourced their training data pipeline backbone, Metaxy, to facilitate iterative development without retaking costly steps.

📖 Read the full source: HN AI Agents

Ad

👀 See Also