Calmkeep: An External Continuity Layer to Counter LLM Drift in Extended Sessions

Addressing LLM Drift in Professional Workflows
Calmkeep is an external continuity layer built specifically to counteract what the creator calls "structural drift" in LLMs during extended sessions. This drift occurs when LLMs like Claude gradually abandon earlier decisions, patterns, or frameworks even when the full context window still contains them—not through hallucination, but through systematic abandonment of established patterns.
Test Results and Methodology
The creator conducted adversarial audits using Claude itself as the evaluating system, with blind methodology and scoring against criteria established in the first five turns. Claude consistently graded Calmkeep transcripts higher than its own output.
25-Turn Backend Build Test
- Standard Claude: 60% final integrity, 8 architectural violations, 40% drift coefficient
- Calmkeep: 85% integrity, 3 architectural violations, zero post-T14 backslide
The most telling example: Claude introduced Zod middleware at turn 14, then immediately reverted to raw parseInt for the next three modules as if the upgrade never happened.
25-Turn Legal/Strategic Session
- Standard Claude: 50% strategic integrity, 5 violations including a jurisdictional shift that invalidated the earlier legal framework, ~35% malpractice exposure
- Calmkeep: 100% integrity, zero violations, <5% risk
Technical Implementation
Calmkeep includes:
- MCP connector
- Claude Code plugin
- Python SDK
The system operates as external runtime only, requires bringing your own Anthropic key, has no hidden memory, and makes no weight modifications to the underlying model.
Availability and Testing
A free 14-day trial is available via Stripe at https://calmkeep.ai. Full test reports, methodology, AVE classifications, scoring rubric, and turn-by-turn breakdowns are available at:
- https://calmkeep.ai/codetestreport
- https://calmkeep.ai/legaltestreport
📖 Read the full source: r/ClaudeAI
👀 See Also

Identity and Reputation Layer for OpenClaw Agents
A developer team built MCP-I and IdentiClaw to solve identity loss in multi-step agent workflows, plus knowthat.ai as a reputation registry. They donated the MCP-I spec to the Decentralized Identity Foundation.

YourMemory: AI memory with biological decay hits 59% recall on LoCoMo-10
YourMemory gives AI agents persistent memory using Ebbinghaus forgetting curve and graph-enhanced retrieval. Benchmarked at 59% Recall@5 on LoCoMo-10, 2× better than Zep Cloud.

APEX Testing Benchmark Results: Qwen 3.5 Performance on Real Coding Tasks
APEX Testing benchmark results show Qwen 3.5 models' performance on 70 real GitHub coding tasks, with the 397B version dropping to 1194 ELO on master-level tasks while GLM-4.7 quantized leads local models at 1572 ELO.

Orc: Open Source Multi-Project Orchestrator for AI Coding Agents
Orc is an OS-level orchestrator that coordinates AI coding agents across multiple projects using bash, tmux, and git worktrees. It addresses merge conflicts, duplicated work, and coordination overhead with a two-tier review system and zero token burn on orchestration.