How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep
In the last post I built a fully-local GraphRAG agent — cbi agent, answering questions over a DuckDB knowledge graph with a Gemma model running on an AMD Strix Halo chip — and then turned the six-question hand-check into a repeatable harness (cbi eval) that scores answers deterministically against a ground-truth key. A harness invites the obvious question: how small can the model be before the whole thing falls apart? Smaller means faster and cheaper, and on a local box that’s the difference between snappy and sluggish. So I ran the smallest two Gemma 4 tiers head to head over a real test set and graded every answer. ...