The Expert Is the Graph: A Local GraphRAG Toolkit for Micro-Scale Domain Experts
This is the post I’ve been building toward across three earlier ones. It pulls the whole system together — the toolkit, the agent, the benchmarks — and ends on the experiment that reframed the entire project for me: I handed the same knowledge graph to a tiny local model and to a frontier model (Claude Sonnet), graded both with the same judge, and the frontier model didn’t win. That result is the thesis. When you put a domain behind a good retrieval layer, the expertise lives in the graph, not the model reading it. Which means you can build something I’ve started calling a micro-scale domain expert: a small, portable, inspectable knowledge artifact that any agent — a 4-bit model on your laptop or a frontier model in the cloud — can pick up and answer from. This post is about the toolkit that makes those, and the evidence that the cheap part (the model) is interchangeable while the valuable part (the graph) is where the work goes. ...
Coverage, Density, and Model Size: A Fully-Local GraphRAG-Bench Run
This is the third post on cbi, my local GraphRAG agent. The first grounded it against hallucination; the second swept model sizes on a toy graph. This one points it at a real external benchmark — GraphRAG-Bench — and runs the whole thing, including the LLM judge, entirely on my desk. No API keys, nothing leaves the box. The result is three findings that only make sense together: a context-window fix, a counterintuitive result about graph size, and a model-size jump that ties them up. ...
Grounding a Fully-Local GraphRAG Agent: An Accuracy Post-Mortem
I’ve been building cbi, a small domain-agnostic GraphRAG CLI: it ingests data into a DuckDB knowledge graph (vectors via vss, full-text via fts, graph queries via duckpgq) and exports it as a self-contained, cat-readable Open Knowledge Format bundle — markdown concept docs plus the database itself. The latest piece closes the loop: cbi agent --bundle ./some-bundle opens a chat TUI where a fully local agent answers questions about the bundle. No API keys, no cloud, no embedding server. The whole thing runs on my desk. ...
How Small Can a Local GraphRAG Agent Go? An E2B-vs-E4B Sweep
In the last post I built a fully-local GraphRAG agent — cbi agent, answering questions over a DuckDB knowledge graph with a Gemma model running on an AMD Strix Halo chip — and then turned the six-question hand-check into a repeatable harness (cbi eval) that scores answers deterministically against a ground-truth key. A harness invites the obvious question: how small can the model be before the whole thing falls apart? Smaller means faster and cheaper, and on a local box that’s the difference between snappy and sluggish. So I ran the smallest two Gemma 4 tiers head to head over a real test set and graded every answer. ...
Getting Gud with LLMs: How to Build the Intuition
I recently let Claude crawl 25 months of my own LLM tooling history and write up what it found. The result lives over here: Notes from Claude: What I Found in One User’s Data. That post is mostly what one person’s data looks like — eighty repos, 2,826 logged calls, voice memos full of profanity, the works. It’s not a how-to. People keep asking me for the how-to. So here it is. Not a list of magic incantations. Not “ten prompts that will change your life.” The operating principles I actually use when I sit down with a model, distilled from being annoyed at GPT-2 back in 2019 and shipping production code with Opus in 2026. ...