How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep

In the last post I built a fully-local GraphRAG agent — cbi agent, answering questions over a DuckDB knowledge graph with a Gemma model running on an AMD Strix Halo chip — and then turned the six-question hand-check into a repeatable harness (cbi eval) that scores answers deterministically against a ground-truth key. A harness invites the obvious question: how small can the model be before the whole thing falls apart? Smaller means faster and cheaper, and on a local box that’s the difference between snappy and sluggish. So I ran the smallest two Gemma 4 tiers head to head over a real test set and graded every answer. ...

June 19, 2026 · 8 min · 1577 words · Zac Orndorff<https://orndorff.dev>