Durable Execution & Cold Start Fix for Agent Harness Outside Sandbox

Mendral's blog argues that the agent harness — the loop that drives an LLM by sending prompts, executing tool calls, and feeding results back — should run outside the sandbox, especially for multi-user agents. They contrast two architectures and detail the three challenges they solved when adopting the outside model.

Two Architectures

Harness inside the sandbox: The loop lives in the same container as the code it works on. Tool calls (bash, read, write) execute locally. Skills and memories are files on the container's filesystem. This is what Claude Code does locally. Simple execution model, but credentials are inside the sandbox, the sandbox is the session (losing it loses progress), and multi-user becomes a distributed filesystem problem.
Harness outside the sandbox: The loop runs on the backend and calls into a sandbox over an API to execute tools. Credentials stay out of the sandbox (no permission model needed). Sandboxes can be suspended when idle, become cattle (survive failures), and multi-user sharing is a shared database problem, not a distributed filesystem one.

Three Challenges Solved

Durable execution: Agent sessions can run hours and must survive deploys and failures. Mendral uses Inngest for checkpointing — each turn is a step, and the loop picks up where it left off if the server restarts.
Sandbox lifecycle with low cold starts: The loop is suspended most of the time (e.g., during LLM calls). They use Blaxel to resume sandboxes from standby in ~25ms, avoiding seconds-long cold starts during interactive turns.
Filesystem abstraction: With harness and sandbox on different machines, a shared filesystem is no longer available. Mendral notes they had to handle this, but the post focuses on the first two as the key solved problems.

The post concludes that the outside model is superior for multi-user setups despite the complexity of durable execution and cold start handling.

📖 Read the full source: HN AI Agents

Agent Harness Outside the Sandbox: Durable Execution & Cold Starts

Two Architectures

Three Challenges Solved

👀 See Also

OpenClaw Agent Auto-Edits HEARTBEAT.md, Adds 10 Self-Assigned Tasks

Benchmarks Show Distilled Models Match Frontier LLMs on Structured Tasks at 10x Lower Cost

An Open Standard for Agent Run Records: The Case for a Shared Log Schema

Claude Code Source Leak Reveals Anti-Distillation, Undercover Mode, and Frustration Detection