Grounding a Fully-Local GraphRAG Agent: An Accuracy Post-Mortem

Fri, 19 Jun 2026 00:00:00 +0000

I’ve been building cbi, a small domain-agnostic GraphRAG CLI: it ingests data into a DuckDB knowledge graph (vectors via vss, full-text via fts, graph queries via duckpgq) and exports it as a self-contained, cat-readable Open Knowledge Format bundle — markdown concept docs plus the database itself.

The latest piece closes the loop: cbi agent --bundle ./some-bundle opens a chat TUI where a fully local agent answers questions about the bundle. No API keys, no cloud, no embedding server. The whole thing runs on my desk.

How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep

Fri, 19 Jun 2026 00:00:00 +0000

In the last post I built a fully-local GraphRAG agent — cbi agent, answering questions over a DuckDB knowledge graph with a Gemma model running on an AMD Strix Halo chip — and then turned the six-question hand-check into a repeatable harness (cbi eval) that scores answers deterministically against a ground-truth key.

A harness invites the obvious question: how small can the model be before the whole thing falls apart? Smaller means faster and cheaper, and on a local box that’s the difference between snappy and sluggish. So I ran the smallest two Gemma 4 tiers head to head over a real test set and graded every answer.

Gemma on orndorff.dev

Grounding a Fully-Local GraphRAG Agent: An Accuracy Post-Mortem

How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep