Content·Engenharia·28 Fev 2026·12 min

Enterprise RAG: 7 patterns we learned in regulated deployments

Per-document permissions, verifiable citations, cost per query. The architectural choices that change everything when RAG leaves the lab.

Retrieval Augmented Generation looks simple until the first regulated deployment. Then each small decision becomes a problem. We list the seven patterns that recur in nearly every project that ships well.

1. Permission per document, not per user

The initial temptation is to map who can read what at the user level. In real organizations, that becomes a maintenance nightmare. What works is tagging each ingested document with its own ACL and filtering retrieval by the intersection between the document's ACL and the user's context at query time.

2. Mandatory citation, always

Every answer must carry the reference to its source span. Not as courtesy — as contract. If the model can't cite, the answer should be "not found". The difference between 'trustworthy system' and 'fun demo' lives here.

3. Reranking matters more than embedding

Swapping the embedding model rarely moves the KPI. Adding a well-tuned reranker between retrieval and generation almost always does. It's the highest-ROI quality investment.

4. Layered cache

Exact-match cache is trivial and gives little. Semantic similarity cache gives much more and demands care with TTL. Whole-answer cache invalidated by document change is what kills inference cost.

5. Continuous evaluation, not point-in-time

Manual evaluation at the start is necessary. Automated evaluation with golden questions is what keeps the system alive. Without it, you learn the system regressed from a complaint, not an alarm.

6. Chunk by structure, not by size

Cutting documents into 500-token pieces is the worst pattern and the most common. Cutting along structure — section, paragraph, table — preserves context and dramatically improves retrieval.

7. Small model at the edge, big model at the core

Routing between models by query complexity cuts cost by 50–70% without perceived quality loss. Simple questions don't need the big model. Identifying which are simple is a trivial classification.

Good enterprise RAG is less about the LLM and more about the infrastructure around it.