RAG Pipeline Protection

Learn how to build RAG pipelines where personal data never reaches your LLM provider. Blindfold provides two protection layers: selective ingestion redaction (strip contact info before indexing, keep names for searchability) and query-time tokenization (protect context and questions before the LLM, restore real data in responses).

Why RAG Needs PII Protection

RAG pipelines are the #1 pattern where PII leaks into LLMs. Documents retrieved from your knowledge base — support tickets, customer records, internal reports — often contain personal data. When those documents are embedded, stored, and retrieved, the PII flows through multiple systems:

Retrieval results — documents with PII are injected into LLM prompts
LLM provider logs — your provider sees the full prompt, including retrieved PII

The privacy boundary is at the LLM API call, not the vector store. Your vector store is internal infrastructure; the LLM provider is an external third party. Blindfold protects data at both layers: selectively strip contact info from documents before they enter the vector store, and tokenize everything before it reaches the LLM.

Security Trade-offs

There is no one-size-fits-all approach to PII in RAG pipelines. The right choice depends on your threat model:

Approach	Names in vector store	Name-based search	PII at LLM boundary	Complexity
Selective redaction (recommended)	Yes	Yes	No (tokenized)	Low
Full redaction	No	No — content-based only	No	Low
Tokenize with stored mapping	No (tokens only)	Yes (via reverse lookup)	No	High

Selective Redaction (Recommended)

Redact contact info (emails, phones, IBANs) at ingestion — keep person names for searchability. At query time, search with the original question (names match), then tokenize context + question in a single call before the LLM. This is the approach used in all cookbook examples and described below.

Full Redaction

Redact all PII at ingestion. Strongest privacy — no personal data anywhere — but you lose the ability to search by name. The vector store can only match based on surrounding content.

Tokenize with Stored Mapping (Advanced)

Tokenize at ingestion and store the mapping. Build a reverse lookup to translate real names in queries to tokens. No PII in the vector store and name-based search works. See the advanced section below for details.

Two Protection Layers

Layer 1: Selective Ingestion Redaction

Redact contact info from documents before embedding and indexing. Names are kept so the vector store can match name-based queries.

Python
JavaScript
LangChain

from blindfold import Blindfold

blindfold = Blindfold(api_key="your-api-key")

documents = [
    "Customer John Smith (john@example.com) reported a billing error.",
    "Maria Garcia (+34 612 345 678) requested a data export.",
]

safe_documents = []
for doc in documents:
    # Redact contact info only — keep names searchable
    result = blindfold.redact(doc, entities=["email address", "phone number"])
    safe_documents.append(result.text)
    # "Customer John Smith ([EMAIL_ADDRESS]) reported a billing error."

# Index safe_documents into your vector store

import { Blindfold } from '@blindfold/sdk';

const blindfold = new Blindfold({ apiKey: 'your-api-key' });

const documents = [
  'Customer John Smith (john@example.com) reported a billing error.',
  'Maria Garcia (+34 612 345 678) requested a data export.',
];

const safeDocuments = [];
for (const doc of documents) {
  // Redact contact info only — keep names searchable
  const result = await blindfold.redact(doc, {
    entities: ['email address', 'phone number'],
  });
  safeDocuments.push(result.text);
}

// Index safeDocuments into your vector store

from langchain_blindfold import BlindfoldPIITransformer
from langchain_core.documents import Document

# Redact contact info only — keep names searchable
transformer = BlindfoldPIITransformer(
    pii_method="redact",
    entities=["email address", "phone number"],
)

docs = [
    Document(page_content="Customer John Smith (john@example.com) reported a billing error."),
    Document(page_content="Maria Garcia (+34 612 345 678) requested a data export."),
]

safe_docs = transformer.transform_documents(docs)
# Index safe_docs into your vector store

Why keep names? At ingestion, person names are replaced with [PERSON]. At query time, names are tokenized to <Person_1>. Neither placeholder matches the other — so searching for “Hans Mueller” cannot find [PERSON] in the vector store. Keeping names at ingestion solves this and lets users search by name. Contact info (emails, phones) is rarely searched for and should always be redacted.

Layer 2: Query-Time Tokenization

After retrieval, tokenize the context and question in a single call before they reach the LLM. Then detokenize the response to restore real data.

Python
JavaScript
LangChain

from blindfold import Blindfold
from openai import OpenAI

blindfold = Blindfold(api_key="your-api-key")
openai_client = OpenAI()

question = "What happened with John Smith's billing issue?"

# Step 1: Search with original question — names match in vector store
results = collection.query(query_texts=[question], n_results=3)
context = "\n\n".join(results["documents"][0])

# Step 2: Single tokenize call — consistent token numbering
prompt_text = f"Context:\n{context}\n\nQuestion: {question}"
tokenized = blindfold.tokenize(prompt_text)

# Step 3: Send to LLM — no PII in the prompt
response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer using the provided context."},
        {"role": "user", "content": tokenized.text},
    ],
)
ai_response = response.choices[0].message.content

# Step 4: Detokenize — restore real names in the response
final = blindfold.detokenize(ai_response, tokenized.mapping)
print(final.text)

import { Blindfold } from '@blindfold/sdk';
import OpenAI from 'openai';

const blindfold = new Blindfold({ apiKey: 'your-api-key' });
const openai = new OpenAI();

const question = "What happened with John Smith's billing issue?";

// Step 1: Search with original question — names match
const results = await collection.query({
  queryTexts: [question], nResults: 3,
});
const context = results.documents[0].join('\n\n');

// Step 2: Single tokenize call — consistent token numbering
const promptText = `Context:\n${context}\n\nQuestion: ${question}`;
const tokenized = await blindfold.tokenize(promptText);

// Step 3: Send to LLM — no PII
const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'system', content: 'Answer using the provided context.' },
    { role: 'user', content: tokenized.text },
  ],
});
const aiResponse = response.choices[0].message.content;

// Step 4: Detokenize
const final = blindfold.detokenize(aiResponse, tokenized.mapping);
console.log(final.text);

from blindfold import Blindfold
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableLambda

blindfold_client = Blindfold(api_key="your-api-key")

def retrieve_and_tokenize(question: str) -> dict:
    # Retrieve with original question — names match
    docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in docs)

    # Single tokenize call — consistent token numbering
    prompt_text = f"Context:\n{context}\n\nQuestion: {question}"
    tokenized = blindfold_client.tokenize(prompt_text)
    return {"tokenized_text": tokenized.text, "mapping": tokenized.mapping}

chain = RunnableLambda(retrieve_and_tokenize) | ...  # LLM + detokenize

Why a single tokenize call? If you tokenize the context and question separately, each call produces independent token numbering. Context might map <Person_1> to “Hans Mueller” while the question maps <Person_1> to “Marie Dupont” — creating mapping conflicts. A single call on the combined text ensures consistent numbering.

Protection Method Comparison

Choose the right protection method for your RAG use case:

Method	Reversible	Best for	Example output
Redact	No	Ingestion — permanent PII removal	`[PERSON]`, `[EMAIL_ADDRESS]`
Tokenize	Yes	Queries — protect input, restore output	`<Person_1>`, `<Email Address_1>`
Encrypt	Yes (with key)	Regulated data requiring audit trail	`ENC_a8f3b2...`
Hash	No	Analytics — consistent pseudonymous IDs	`HASH_a3f8b9c2d4e5`

Recommended pattern: Use redact with entities at ingestion time (Layer 1) to strip contact info while keeping names. At query time (Layer 2), search with the original question and tokenize the combined context + question before the LLM call. This gives you searchability by name and full PII protection at the LLM boundary.

Advanced: Tokenize with Stored Mapping

For the strongest privacy with full searchability — no PII in the vector store and name-based search — tokenize at ingestion and store the mapping. This is the most complete architecture but requires managing a mapping store. How it works:

Ingestion: tokenize() each document → store tokenized text in vector store + store mapping securely
Query: Build a reverse lookup from stored mappings. Replace real names in the query with their tokens before searching
LLM: Tokenized context + tokenized query → LLM sees only tokens
Response: Detokenize using stored mappings

from blindfold import Blindfold

blindfold = Blindfold(api_key="your-api-key")

# === Ingestion ===
documents = [...]
mapping_store = {}  # In production: encrypted DB or secrets manager

for doc in documents:
    result = blindfold.tokenize(doc)
    # Store tokenized text in vector store
    vectorstore.add(result.text)
    # Store mapping securely (keyed by doc ID or merged globally)
    mapping_store.update(result.mapping)

# Build reverse lookup: real value → token
reverse_lookup = {v: k for k, v in mapping_store.items()}

# === Query ===
question = "What happened with Hans Mueller?"

# Replace known real values with their tokens
for real_value, token in reverse_lookup.items():
    question = question.replace(real_value, token)
# question: "What happened with <Person_1>?"

# Search with tokenized query — tokens match tokens in vector store
results = vectorstore.query(question, n_results=3)

# Context is already tokenized, question is already tokenized
# Send directly to LLM — no PII
response = llm.generate(context=results, question=question)

# Detokenize for the user
final = blindfold.detokenize(response, mapping_store)

Trade-offs:

Requires managing a mapping store (encrypted DB, secrets manager)
Reverse lookup needs exact string matching (partial names may not match)
More complex than the selective-redaction approach
But: strongest privacy with full searchability — no PII in the vector store at all

detokenize() is a free local operation — no API call. This means the mapping store is the only infrastructure you need to manage.

Policy Recommendations

Match your compliance policy to your use case:

Use case	Policy	Region	Key entities detected
General RAG	`basic`	—	Names, emails, phones, addresses, credit cards
EU customer data	`gdpr_eu`	`eu`	Names, emails, IBANs, national IDs, DOB, addresses
US healthcare	`hipaa_us`	`us`	All 18 HIPAA identifiers (SSN, MRN, DOB, etc.)
Payment data	`pci_dss`	—	Credit cards, CVVs, expiration dates
Maximum coverage	`strict`	—	All supported entity types, lowest threshold

# GDPR-compliant RAG — redact contact info, keep names
blindfold = Blindfold(api_key="your-key", region="eu")
result = blindfold.redact(document, policy="gdpr_eu", entities=[
    "email address", "phone number", "iban", "credit card number",
    "address", "date of birth", "national id number",
])

# HIPAA-compliant RAG
blindfold = Blindfold(api_key="your-key", region="us")
result = blindfold.redact(document, policy="hipaa_us")

Performance Tips

Batch redaction at ingestion — use blindfold.redact_batch() for processing multiple documents in one API call
Async processing — use AsyncBlindfold for concurrent document processing during ingestion
Detokenization is free — detokenize() is a local string replacement, no API call required
Cache redacted documents — once documents are redacted and indexed, no further Blindfold calls are needed for retrieval

Cookbook Examples

Complete, runnable examples for every RAG framework:

OpenAI + ChromaDB (Python)

Selective redaction + search-first tokenization

OpenAI + ChromaDB (Node.js)

TypeScript OpenAI + ChromaDB RAG pipeline

LangChain + FAISS (Python)

BlindfoldPIITransformer + retrieve-then-tokenize

LangChain + FAISS (Node.js)

LangChain.js RAG with inline PII protection

LlamaIndex (Python)

Retrieve-then-tokenize with LlamaIndex

LlamaIndex (Node.js)

LlamaIndex.TS with single tokenize call

GDPR Customer Support (Python)

Multi-turn EU support chatbot with gdpr_eu policy

GDPR Customer Support (Node.js)

TypeScript multi-turn EU support chatbot

Strategy Deep-Dives

Standalone examples for each ingestion strategy — compare trade-offs side by side:

Selective Redact (Python)

Keep names, redact contact info — simplest approach

Selective Redact (Node.js)

TypeScript version of the selective redact strategy

Stored Mapping (Python)

Tokenize everything, store per-document mappings

Stored Mapping (Node.js)

TypeScript version of the stored mapping strategy

Consistent Registry (Python)

Same person = same token everywhere — best search quality

Consistent Registry (Node.js)

TypeScript version of the consistent registry strategy

Strategy Comparison (Python)

All 3 strategies side by side with CLI selection

Strategy Comparison (Node.js)

TypeScript version — all 3 strategies with CLI selection

Role-Based Access Control (RBAC)

Use Blindfold policies to implement role-based PII control — same vector store, different privacy levels per user role:

RBAC with Policies (Python)

Doctor, nurse, billing, researcher — each role sees different PII levels

RBAC with Policies (Node.js)

TypeScript version of the role-based PII control example

​Why RAG Needs PII Protection

​Security Trade-offs

​Selective Redaction (Recommended)

​Full Redaction

​Tokenize with Stored Mapping (Advanced)

​Two Protection Layers

​Layer 1: Selective Ingestion Redaction

​Layer 2: Query-Time Tokenization

​Protection Method Comparison

​Advanced: Tokenize with Stored Mapping

​Policy Recommendations

​Performance Tips

​Cookbook Examples

OpenAI + ChromaDB (Python)

OpenAI + ChromaDB (Node.js)

LangChain + FAISS (Python)

LangChain + FAISS (Node.js)

LlamaIndex (Python)

LlamaIndex (Node.js)

GDPR Customer Support (Python)

GDPR Customer Support (Node.js)

​Strategy Deep-Dives

Selective Redact (Python)

Selective Redact (Node.js)

Stored Mapping (Python)

Stored Mapping (Node.js)

Consistent Registry (Python)

Consistent Registry (Node.js)

Strategy Comparison (Python)

Strategy Comparison (Node.js)

​Role-Based Access Control (RBAC)

RBAC with Policies (Python)

RBAC with Policies (Node.js)

Why RAG Needs PII Protection

Security Trade-offs

Selective Redaction (Recommended)

Full Redaction

Tokenize with Stored Mapping (Advanced)

Two Protection Layers

Layer 1: Selective Ingestion Redaction

Layer 2: Query-Time Tokenization

Protection Method Comparison

Advanced: Tokenize with Stored Mapping

Policy Recommendations

Performance Tips

Cookbook Examples

Strategy Deep-Dives

Role-Based Access Control (RBAC)