RAG is architecture. — ZRØ.design Journal

Most RAG systems get built backward. A team has a corpus, a team has a model, and the assumption is that retrieval is the wire between them. Plug in an embedding model, build an index, surface the top-k chunks, hand them to the model. The wire works. The system that results does roughly what the corpus and the model allow, no more, no less. Retrieval was treated as plumbing. The studio's working position is that retrieval is the architecture itself — the substrate the system stands on, not the cable connecting two ends.

A retrieval architecture, drawn honestly, names what is canonical in the corpus, what is provisional, what is local to a single context, what generalizes. It names where authority lives — which sources the system defers to, which sources require corroboration, which sources cannot be trusted to make a load-bearing claim. The drawing also names what the system will refuse to retrieve. Refusal is part of the substrate, not an exception case. Without it, the system has no shape.

The most common mistake in production RAG is conflating recall with relevance. A system that returns ten chunks for every query feels powerful in development and collapses in production. The chunks dilute one another, the model averages across them, and the answer becomes a confident smoothing of mediocre matches. The fix is not better embedding models. The fix is structural — design the retrieval to return fewer, sharper, more cited results. The system that returns three high-confidence chunks compounds across queries. The system that returns ten low-confidence chunks drifts.

A system stands on what it can cite.

Citations are not decoration in this medium. Citations are the structural assertion that a claim has a source the reader can read. A system without citations is a system that has decided its readers should trust it without verification. The studio holds that no AI system designed for serious use should make that decision on the user's behalf. Every claim should be traceable. Every retrieval should carry the provenance forward. The architecture is what makes that possible at scale.

The evaluation question for a retrieval system is not 'is the answer correct' — it is 'could a competent reader trace the answer back to a source that supports it.' The first question is downstream of the model. The second question is upstream of the model — it lives in the retrieval architecture. Teams that design for the second question build systems that stand. Teams that optimize for the first question build systems that perform well in demos and quietly fail in the field.

The retrieval layer is the layer above which the rest of the system can be honest. Get it right, and the model becomes a writer that quotes well. Get it wrong, and the model becomes a writer that fabricates fluently. The model itself does not change between those two outcomes. The substrate does. RAG is not a technique applied to a model. RAG is the architecture the model stands on.