4-Tier Knowledge Base Architecture to Boost AI Agent Accuracy

A developer on r/openclaw detailed an architecture for a structured knowledge base designed to make generic LLM agents into domain experts by providing specific context about tools, workflows, and policies.

The problem with common RAG approaches

The source identifies several issues with typical RAG implementations: no query classification (every question gets the same retrieval pipeline), no tiering (governance docs treated the same as blog posts), no budget (agent context window stuffed with irrelevant chunks), and no self-healing (stale/broken docs stay broken forever).

A 4-tier KB pipeline

The system uses four distinct tiers:

Governance tier — Always loaded. Contains agent identity, policies, and rules as non-negotiable context.
Agent tier — Per-agent documentation. For example, a voice agent named Lucy gets call handling docs, while an agent named Binky (CRO) gets conversion docs.
Relevant tier — Dynamic per-query retrieval with title/body matching, limited to a maximum of 5 docs and a 12K character budget per document.
Wiki tier — 200+ reference articles searchable via a filesystem bridge, covering AI history, tool definitions, workflow patterns, and platform comparisons.

Query classification as a secret weapon

Before any retrieval happens, a regex-based classifier determines how much context a question needs:

DIRECT — For tasks like "Summarize this text" where no KB is needed.
SKILL_ONLY — For tasks like "Write me a tweet" where the agent's skill doc is sufficient.
HOT_CACHE — For questions like "Who handles billing?" answered from governance and agent docs in memory cache.
FULL_RAG — For complex queries like "Compare n8n vs Zapier pricing" requiring full vector search and wiki bridge.

This classification alone reportedly cut token costs by approximately 40% because most questions don't need full RAG.

KB structure and organization

Each of the 200+ articles follows a consistent format: a clear title with scope, practical content (tables, code examples, decision frameworks), 2+ cited sources with real URLs, 5 image reference descriptions, and 2 video references.

The content is organized into specific domains:

AI/ML foundations (18 articles) — history, transformers, embeddings, agents
Tooling (16 articles) — definitions, security, taxonomy, error handling, audit
Workflows (18 articles) — types, platforms, cost analysis, HIL patterns
Image generation (115 files) — 16 providers, comparisons, prompt frameworks
Video generation (109 files) — treatments, pipelines, platform guides
Support (60 articles) — customer help center content

Self-healing system

The architecture includes an evaluation system that scores KB health on a 0-100 scale and automatically addresses issues: missing embeddings trigger re-embedding, stale content gets flagged for refresh, and broken references are repaired or removed. The health score reportedly improved from 71 to 89 after the first healing pass.

Results and key takeaways

Before the KB implementation, agents would hallucinate tool definitions, make up pricing, and give generic workflow advice. After implementation, agents cite specific documents, provide accurate platform comparisons with real pricing, and know when to say "I don't have current data on that."

Key takeaways from the implementation:

Classify before you retrieve — not every question needs RAG.
Budget your context window — 60K characters total, with a hard cap per document.
Structure beats volume — 200 well-organized articles are better than 10,000 random chunks.
Self-healing isn't optional — knowledge bases decay, so build monitoring from day one.
Write for agents, not humans — prioritize tables over paragraphs, decision frameworks over prose, and concrete examples over abstract explanations.

📖 Read the full source: r/openclaw