Coding Agent Session Logs Are Stored Locally, Could Enable Open Federated Training

When you use coding agents like Claude Code or Codex CLI in agent mode, they log comprehensive session data locally on your machine. These logs capture the full interaction loop: your initial task, the model's reasoning process, every tool call made, every environment response, every error encountered and retry attempted. This creates complete (state → action → reward → next state) tuples—the exact data format reinforcement learning researchers need.
What's in the logs
The source author checked their own machines and found:
- Mac Mini: ~/.claude/projects/ containing 3.1GB across 1103 files from 574 agentic sessions
- MacBook: ~/.codex/sessions/ containing 2.4GB across 3530 files from 79 agentic sessions
- MacBook: ~/.claude/projects/ containing 652MB across 316 files from 99 agentic sessions
In total, they identified 775 sessions with real tool calls containing approximately 41 million tokens. Extrapolated across thousands of developers, this could represent hundreds of billions of tokens of real agentic trajectory data—data that currently has no open equivalent like The Pile dataset.
Why this data matters
The environment provides clear feedback signals: exit code 0 or not, tests pass or not. This offers the missing training signal for causal reasoning, error recovery, and long-horizon planning—areas where current models struggle. Big AI labs already collect this data internally to train their proprietary models, but there's no open equivalent because the data is fragmented across individual developer machines.
The proposal: Federated learning
The post proposes using federated learning where your data never leaves your machine. You would train a small LoRA adapter locally, share only the weights with differential privacy noise added, and receive an improved global model in return. Everyone contributes compute and signal without exposing their raw data. Alternatively, the community could anonymize the data to create a dataset for fine-tuning models.
Practical steps
To preserve your logs (Claude Code deletes them after 30 days by default):
echo '{"cleanupPeriodDays": 36500}' > ~/.claude/settings.json
To check what's on your own machines:
du -sh ~/.codex/sessions/ 2>/dev/null
du -sh ~/.claude/projects/ 2>/dev/null
find ~/.codex/sessions/ -name "*.jsonl" | wc -l
find ~/.claude/projects/ -name "*.jsonl" | wc -l
The Reddit post encourages developers to share their numbers in the comments to gauge the actual scale of unused data across the community, with the goal of building an open equivalent if there's enough interest.
📖 Read the full source: r/LocalLLaMA
👀 See Also

AI Graveyard: 100 Shutdown & Acquired AI Tools Tracked – 88 in 2026 Alone
ToolDirectory.ai's AI Graveyard tracks 100 discontinued or acquired AI products, with 88 deaths recorded in 2026. Categories include Developer Tools, AI Agents, Customer Support, and more, with many acquisitions folding into larger platforms like Salesforce.

AI Tools May Lead to Homogenized Output in Creative and Development Work
A Reddit user reports that multiple teams using AI tools like ChatGPT, Co-Pilot, and Claude for strategy roadmaps and software development are producing similar outputs with identical buzzword patterns and design structures.

Agent.Email: AI Agents Sign Up via curl, Claimed by Human OTP
AgentMail's Agent.Email lets AI agents self-provision an inbox via curl, then a human claims it with an OTP. Restricted access until claimed, rate-limited by IP.

Claude Desktop App Silently Downloads 13 GB File on Every Launch Without Opt-Out
The Claude desktop app automatically downloads a ~12.95 GB file called claudevm.bundle on every launch, even for users who don't use Claude Code. Anthropic support confirmed this is intentional and individual users have no way to disable it.