Skip to main content
Learn how to build RAG pipelines where personal data never reaches your LLM provider. Blindfold provides two protection layers: selective ingestion redaction (strip contact info before indexing, keep names for searchability) and query-time tokenization (protect context and questions before the LLM, restore real data in responses).

Why RAG Needs PII Protection

RAG pipelines are the #1 pattern where PII leaks into LLMs. Documents retrieved from your knowledge base — support tickets, customer records, internal reports — often contain personal data. When those documents are embedded, stored, and retrieved, the PII flows through multiple systems:
  1. Retrieval results — documents with PII are injected into LLM prompts
  2. LLM provider logs — your provider sees the full prompt, including retrieved PII
The privacy boundary is at the LLM API call, not the vector store. Your vector store is internal infrastructure; the LLM provider is an external third party. Blindfold protects data at both layers: selectively strip contact info from documents before they enter the vector store, and tokenize everything before it reaches the LLM.

Security Trade-offs

There is no one-size-fits-all approach to PII in RAG pipelines. The right choice depends on your threat model:
ApproachNames in vector storeName-based searchPII at LLM boundaryComplexity
Selective redaction (recommended)YesYesNo (tokenized)Low
Full redactionNoNo — content-based onlyNoLow
Tokenize with stored mappingNo (tokens only)Yes (via reverse lookup)NoHigh
Redact contact info (emails, phones, IBANs) at ingestion — keep person names for searchability. At query time, search with the original question (names match), then tokenize context + question in a single call before the LLM. This is the approach used in all cookbook examples and described below.

Full Redaction

Redact all PII at ingestion. Strongest privacy — no personal data anywhere — but you lose the ability to search by name. The vector store can only match based on surrounding content.

Tokenize with Stored Mapping (Advanced)

Tokenize at ingestion and store the mapping. Build a reverse lookup to translate real names in queries to tokens. No PII in the vector store and name-based search works. See the advanced section below for details.

Two Protection Layers

Layer 1: Selective Ingestion Redaction

Redact contact info from documents before embedding and indexing. Names are kept so the vector store can match name-based queries.
from blindfold import Blindfold

blindfold = Blindfold(api_key="your-api-key")

documents = [
    "Customer John Smith (john@example.com) reported a billing error.",
    "Maria Garcia (+34 612 345 678) requested a data export.",
]

safe_documents = []
for doc in documents:
    # Redact contact info only — keep names searchable
    result = blindfold.redact(doc, entities=["email address", "phone number"])
    safe_documents.append(result.text)
    # "Customer John Smith ([EMAIL_ADDRESS]) reported a billing error."

# Index safe_documents into your vector store
Why keep names? At ingestion, person names are replaced with [PERSON]. At query time, names are tokenized to <Person_1>. Neither placeholder matches the other — so searching for “Hans Mueller” cannot find [PERSON] in the vector store. Keeping names at ingestion solves this and lets users search by name. Contact info (emails, phones) is rarely searched for and should always be redacted.

Layer 2: Query-Time Tokenization

After retrieval, tokenize the context and question in a single call before they reach the LLM. Then detokenize the response to restore real data.
from blindfold import Blindfold
from openai import OpenAI

blindfold = Blindfold(api_key="your-api-key")
openai_client = OpenAI()

question = "What happened with John Smith's billing issue?"

# Step 1: Search with original question — names match in vector store
results = collection.query(query_texts=[question], n_results=3)
context = "\n\n".join(results["documents"][0])

# Step 2: Single tokenize call — consistent token numbering
prompt_text = f"Context:\n{context}\n\nQuestion: {question}"
tokenized = blindfold.tokenize(prompt_text)

# Step 3: Send to LLM — no PII in the prompt
response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer using the provided context."},
        {"role": "user", "content": tokenized.text},
    ],
)
ai_response = response.choices[0].message.content

# Step 4: Detokenize — restore real names in the response
final = blindfold.detokenize(ai_response, tokenized.mapping)
print(final.text)
Why a single tokenize call? If you tokenize the context and question separately, each call produces independent token numbering. Context might map <Person_1> to “Hans Mueller” while the question maps <Person_1> to “Marie Dupont” — creating mapping conflicts. A single call on the combined text ensures consistent numbering.

Protection Method Comparison

Choose the right protection method for your RAG use case:
MethodReversibleBest forExample output
RedactNoIngestion — permanent PII removal[PERSON], [EMAIL_ADDRESS]
TokenizeYesQueries — protect input, restore output<Person_1>, <Email Address_1>
EncryptYes (with key)Regulated data requiring audit trailENC_a8f3b2...
HashNoAnalytics — consistent pseudonymous IDsHASH_a3f8b9c2d4e5
Recommended pattern: Use redact with entities at ingestion time (Layer 1) to strip contact info while keeping names. At query time (Layer 2), search with the original question and tokenize the combined context + question before the LLM call. This gives you searchability by name and full PII protection at the LLM boundary.

Advanced: Tokenize with Stored Mapping

For the strongest privacy with full searchability — no PII in the vector store and name-based search — tokenize at ingestion and store the mapping. This is the most complete architecture but requires managing a mapping store. How it works:
  1. Ingestion: tokenize() each document → store tokenized text in vector store + store mapping securely
  2. Query: Build a reverse lookup from stored mappings. Replace real names in the query with their tokens before searching
  3. LLM: Tokenized context + tokenized query → LLM sees only tokens
  4. Response: Detokenize using stored mappings
from blindfold import Blindfold

blindfold = Blindfold(api_key="your-api-key")

# === Ingestion ===
documents = [...]
mapping_store = {}  # In production: encrypted DB or secrets manager

for doc in documents:
    result = blindfold.tokenize(doc)
    # Store tokenized text in vector store
    vectorstore.add(result.text)
    # Store mapping securely (keyed by doc ID or merged globally)
    mapping_store.update(result.mapping)

# Build reverse lookup: real value → token
reverse_lookup = {v: k for k, v in mapping_store.items()}

# === Query ===
question = "What happened with Hans Mueller?"

# Replace known real values with their tokens
for real_value, token in reverse_lookup.items():
    question = question.replace(real_value, token)
# question: "What happened with <Person_1>?"

# Search with tokenized query — tokens match tokens in vector store
results = vectorstore.query(question, n_results=3)

# Context is already tokenized, question is already tokenized
# Send directly to LLM — no PII
response = llm.generate(context=results, question=question)

# Detokenize for the user
final = blindfold.detokenize(response, mapping_store)
Trade-offs:
  • Requires managing a mapping store (encrypted DB, secrets manager)
  • Reverse lookup needs exact string matching (partial names may not match)
  • More complex than the selective-redaction approach
  • But: strongest privacy with full searchability — no PII in the vector store at all
detokenize() is a free local operation — no API call. This means the mapping store is the only infrastructure you need to manage.

Policy Recommendations

Match your compliance policy to your use case:
Use casePolicyRegionKey entities detected
General RAGbasicNames, emails, phones, addresses, credit cards
EU customer datagdpr_eueuNames, emails, IBANs, national IDs, DOB, addresses
US healthcarehipaa_ususAll 18 HIPAA identifiers (SSN, MRN, DOB, etc.)
Payment datapci_dssCredit cards, CVVs, expiration dates
Maximum coveragestrictAll supported entity types, lowest threshold
# GDPR-compliant RAG — redact contact info, keep names
blindfold = Blindfold(api_key="your-key", region="eu")
result = blindfold.redact(document, policy="gdpr_eu", entities=[
    "email address", "phone number", "iban", "credit card number",
    "address", "date of birth", "national id number",
])

# HIPAA-compliant RAG
blindfold = Blindfold(api_key="your-key", region="us")
result = blindfold.redact(document, policy="hipaa_us")

Performance Tips

  • Batch redaction at ingestion — use blindfold.redact_batch() for processing multiple documents in one API call
  • Async processing — use AsyncBlindfold for concurrent document processing during ingestion
  • Detokenization is freedetokenize() is a local string replacement, no API call required
  • Cache redacted documents — once documents are redacted and indexed, no further Blindfold calls are needed for retrieval

Cookbook Examples

Complete, runnable examples for every RAG framework:

Strategy Deep-Dives

Standalone examples for each ingestion strategy — compare trade-offs side by side: