LGPD and LLMs: what changes when the model "sees" the data
Self-hosting, redaction, semantic anonymization. A practical map out of legal uncertainty and into a running, compliant system.
When the model is yours, LGPD applies the way it always has. When the model belongs to a third party and processes your customer's personal data, it changes. We list the three decisions that actually cost money.
Decision 1 — Where the model runs
A model behind a public API isn't forbidden by itself. It's forbidden without the right contract and controls. Self-hosting open weights eliminates part of the problem but adds operational cost. Hybrid — sensitive in-house, general in cloud — is usually the path.
Decision 2 — What the model sees
Redaction before the call covers 80% of cases. For the remaining 20%, semantic anonymization via embeddings or token substitution preserves utility without exposing data. Worth the investment.
Decision 3 — What gets logged
Prompt-and-response logging becomes a new personal-data store. It needs retention, access control and right-to-be-forgotten equal to the original store. Almost no project treats this properly at the start.