Grounding a Fully-Local GraphRAG Agent: An Accuracy Post-Mortem

I’ve been building cbi, a small domain-agnostic GraphRAG CLI: it ingests data into a DuckDB knowledge graph (vectors via vss, full-text via fts, graph queries via duckpgq) and exports it as a self-contained, cat-readable Open Knowledge Format bundle — markdown concept docs plus the database itself. The latest piece closes the loop: cbi agent --bundle ./some-bundle opens a chat TUI where a fully local agent answers questions about the bundle. No API keys, no cloud, no embedding server. The whole thing runs on my desk. ...

June 19, 2026 · 10 min · 2126 words · Zac Orndorff<https://orndorff.dev>

How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep

In the last post I built a fully-local GraphRAG agent — cbi agent, answering questions over a DuckDB knowledge graph with a Gemma model running on an AMD Strix Halo chip — and then turned the six-question hand-check into a repeatable harness (cbi eval) that scores answers deterministically against a ground-truth key. A harness invites the obvious question: how small can the model be before the whole thing falls apart? Smaller means faster and cheaper, and on a local box that’s the difference between snappy and sluggish. So I ran the smallest two Gemma 4 tiers head to head over a real test set and graded every answer. ...

June 19, 2026 · 8 min · 1577 words · Zac Orndorff<https://orndorff.dev>