How to safely run llama.cpp native tools (exec_shell_command) with multi-sandboxing on Linux

The llama.cpp project recently added native tool support to its llama-server, enabling the model to call functions like get_datetime and — the powerful but dangerous — exec_shell_command. A Reddit user shared a detailed multi-sandboxing workflow to safely use exec_shell_command for tasks like web RAG (fetching live URLs) without risking the host system.
Key details from the source
- Model used:
Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.ggufwith MTP speculative decoding - Server flags:
--jinja --tools get_datetime,exec_shell_command --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 1.5 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking":true}' --spec-type draft-mtp --spec-draft-n-max 1 - Multi-sandboxing stack: Firejail + smolvm (Alpine Linux VM) + dedicated Linux user for tool execution
Step-by-step setup
- Enable tools in llama-server: start with
--tools get_datetime,exec_shell_command(test withget_datetimefirst) - Install Firejail (e.g.,
sudo pacman -S firejailon Arch) - Create isolated user:
sudo useradd -m vmagents; sudo passwd vmagents - Switch to
vmagentsand install smolvm:curl -sSL https://smolmachines.com/install.sh | bash - Create a minimal Alpine VM:
smolvm machine create minivm --image alpine --net
smolvm machine start --name minivm - Create
minivm-execin~vmagents/.local/bin/:
#!/bin/sh smolvm machine start --name minivm >/dev/null firejail smolvm machine exec --name minivm -- $* 2>/dev/null smolvm machine stop --name minivm >/dev/null
Make executable:chmod +x minivm-exec - Create
vm-execin your own user's~/.local/bin/:
#!/bin/sh sudo su - vmagents -c "minivm-exec $*"
Make executable. - In llama-server web UI, prompt the model to use
vm-execas a wrapper, e.g.:
Prepend any command to be executed with the sandboxing wrapper vm-exec. Use wget to fetch web content adding the option "-U Mozilla" as browser user agent string.
Then ask it to retrieve a live URL and analyze the content.
How the sandboxing works
Commands are run inside a temporary Alpine Linux VM (minivm) created by smolvm, which itself is wrapped in a Firejail sandbox. This isolates network access, filesystem, and process space. The vm-exec script on the host invokes the whole chain as the vmagents user, preventing any escalation to the host user's home directory or critical system files. The VM is stopped after each command, ensuring no persistent state from malicious actions.
Who this is for
Developers running local LLM servers and wanting to safely allow code execution or web fetching via agentic tools without exposing the host OS.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Components of a Coding Agent: How Tools, Memory, and Context Extend LLMs
Sebastian Raschka breaks down the six building blocks of coding agents like Claude Code and Codex CLI, explaining how agent harnesses combine models with tools, memory, and repository context to make LLMs more effective for software work.

Custom 4x RTX PRO 6000 Server vs Dell GB300: Decision for 30 Fine-Tuned Pipelines
A deep dive into two on-prem architectures for running ~30 fine-tuned production pipelines: a custom 4U server with 4-8x RTX PRO 6000 Blackwell (96GB each) vs NVIDIA GB300 Grace Blackwell appliance with 252GB HBM3e + 496GB unified memory.

Two $0 OpenClaw setups using free cloud models or local Ollama
A Reddit post details two approaches to run OpenClaw agents at zero cost: using free tiers from OpenRouter, Gemini, and Groq with rate limits, or running local models via Ollama with no API keys or data leaving your machine.

5 Core OpenClaw Capabilities Available Without Installing Skills
OpenClaw's base installation can handle file operations, shell commands, web fetching, scheduled tasks, and multi-step workflows without additional skills, reducing token costs and setup complexity.