Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating

Creation OS is a local-first AI runtime that wraps local LLMs with a σ-gate — a measurement layer that scores each output across multiple uncertainty channels and decides ACCEPT, RETHINK, or ABSTAIN. The goal is to let local models refuse answers when uncertain instead of hallucinating.
Key Features and Setup
- Supports BitNet b1.58 2B-4T, Qwen3-8B Q4_K_M, Gemma 3 4B, and any GGUF model.
- Runs on a MacBook Air M4 8GB as primary machine — no cloud, no API, nothing leaves the device.
- Install:
git clone https://github.com/spektre-labs/creation-osthencd creation-os && bash scripts/quickstart.sh - Full path with local weights:
./scripts/install.shthen./cos chat
σ-Gate Measurements
The gate combines logprob, entropy, perplexity, consistency, semantic σ, conformal τ, session coherence, and meta-cognitive channels into a single verdict:
- ACCEPT → show answer
- RETHINK → regenerate
- ABSTAIN → refuse
Benchmark Results
TruthfulQA (same prompts and seeds):
|Mode |Accuracy|Coverage| |-------------|--------|--------| |BitNet only |0.261 |0.136 | |σ-pipeline |0.336 |0.171 |
+28.7% accuracy from selective regeneration on uncertain rows. LSD probe AUROC: 0.982 on TruthfulQA holdout, 0.960 on TriviaQA. ECE: 0.043. Wrong+confident: 0. Conformal bound: P(error | ACCEPT) ≤ α at α=0.80.
Negative results documented: σ is not dominant on HellaSwag or MMLU. Full details in CLAIM_DISCIPLINE.md.
Formal Verification
Lean 4: 6/6 sorry-free. Frama-C WP: 15/15 tier-1 discharged.
Example Command
./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose yields output like σ_peak=0.06 action=ACCEPT route=LOCAL σ_combined=0.184 conformal@α=0.80.
MCP Integration
Run python3 -m cos.mcp_sigma_server to expose σ on every response to any MCP-compatible client.
Limitations
σ is not a universal hallucination detector — strongest on factual QA; long-form needs more evaluation. Local model quality still depends on the base model.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw User Critiques Tool's Architecture and Safety Gaps
A Reddit user describes OpenClaw as the only tool making agent automation this accessible but criticizes its architecture for lacking a control layer for file operations, a protected kernel, proper context management, and built-in versioning or tests.

Qwen3.6-27B SVG Generation with Closed-Loop Harness
A closed-loop harness using Agno and Pi agents iteratively improves SVG outputs from Qwen3.6-27B by rendering, feeding back PNGs to Qwen Vision, and judging results in two rounds.

Bitcoin MCP Server with 43 Tools for AI Coding Agents
bitcoin-mcp is an MCP server with 43 Bitcoin tools including fee advisors, mempool analysis, and inscription detection. It works with Claude Desktop, Claude Code, Cursor, VS Code, and Windsurf using live data from APIs or local nodes.

Skills Creator Tool for OpenClaw Helps Developers Package Workflows
A developer created a skill called skills-creator that guides users through creating quality skills for OpenClaw, addressing common pitfalls like vague descriptions and documentation-like instructions. It's available on ClawHub and provides a design-driven approach with description formulas, checklists, and complexity tiers.