> ## Documentation Index > Fetch the complete documentation index at: https://docs.blindfold.dev/llms.txt > Use this file to discover all available pages before exploring further. # Tokenization > Replace sensitive data with reversible tokens ## What is Tokenization? Tokenization is a reversible privacy protection method that replaces sensitive data with placeholder tokens (e.g., ``, ``). The original values are stored in a mapping that allows you to restore the data later. **Example:** ``` Input: "Contact John Doe at john@example.com" Output: "Contact at " Mapping: { "": "John Doe", "": "john@example.com" } ``` ## How It Works 1. **Detection**: Blindfold's AI engine scans your text and identifies sensitive entities (names, emails, phone numbers, etc.) 2. **Replacement**: Each detected entity is replaced with a unique token based on its type 3. **Mapping**: A mapping dictionary is created to link tokens back to original values 4. **Detokenization**: Later, you can use the mapping to restore the original data ## When to Use Tokenization Tokenization is ideal when you need to: ### 1. Protect Data Sent to AI Models Send user data to OpenAI, Anthropic, or other LLMs without exposing sensitive information. ```python theme={null} # Tokenize before sending to AI protected = client.tokenize("My email is john@example.com") ai_response = openai.chat(protected.text) # Restore original data in the response final = client.detokenize(ai_response, protected.mapping) ``` **Why this matters:** * AI providers log conversations * Prevents PII from being stored in third-party systems * Maintains compliance with privacy regulations ### 2. Temporary Data Anonymization Anonymize data for processing, then restore it afterward. ```python theme={null} # Process data anonymously protected = client.tokenize(user_message) processed = process_in_third_party_service(protected.text) # Restore when needed final = client.detokenize(processed, protected.mapping) ``` ### 3. Data Sharing with External Partners Share data with partners or contractors without exposing real PII. ```python theme={null} # Share tokenized data protected = client.tokenize(customer_data) send_to_partner(protected.text) # Partner processes tokenized data # You can restore when getting results back ``` ### 4. Development and Testing Use tokenized production data in development environments. ```python theme={null} # Tokenize production data for dev environment protected = client.tokenize(production_data) load_into_dev_database(protected.text) ``` ## When NOT to Use Tokenization Tokenization is **not suitable** when: ### 1. You Don't Need to Restore Data If you never need the original values, use **Redaction** or **Hashing** instead. ```python theme={null} # Bad - unnecessary tokenization protected = client.tokenize(log_message) # Never use the mapping # Good - use redaction redacted = client.redact(log_message) ``` ### 2. You Need Partial Visibility If users need to see part of the data (like last 4 digits of a card), use **Masking**. ```python theme={null} # Bad - completely hidden protected = client.tokenize("Card: 4532-7562-9102-3456") # Output: "Card: " # Good - show last 4 digits masked = client.mask("Card: 4532-7562-9102-3456") # Output: "Card: ***************3456" ``` ### 3. You Need Consistent Identifiers For analytics or tracking, use **Hashing** to get deterministic identifiers. ```python theme={null} # Bad - different tokens each time token1 = client.tokenize("john@example.com") # token2 = client.tokenize("john@example.com") # (different!) # Good - same hash every time hash1 = client.hash("john@example.com") # ID_a3f8b9c2... hash2 = client.hash("john@example.com") # ID_a3f8b9c2... (same!) ``` ## Key Features Restore original data anytime using the mapping Different tokens for different entity types (PERSON, EMAIL, etc.) Same value gets same token within one request Automatically detects names, emails, SSNs, cards, and more ## Token Format Tokens follow a predictable format: `` * ``, `` - Person names * ``, `` - Email addresses * `` - Phone numbers * `` - Credit card numbers * `` - Social Security Numbers * And 50+ more types... ## Quick Start ```python theme={null} from blindfold import Blindfold client = Blindfold(api_key="your-api-key") # Tokenize response = client.tokenize( "My email is john@example.com and phone is +1-555-1234" ) print(response.text) # "My email is and phone is " print(response.mapping) # {'': 'john@example.com', '': '+1-555-1234'} # Detokenize original = client.detokenize( "Contact ", response.mapping ) print(original.text) # "Contact john@example.com" ``` ```javascript theme={null} import { Blindfold } from '@blindfold/sdk'; const client = new Blindfold({ apiKey: 'your-api-key' }); // Tokenize const response = await client.tokenize( "My email is john@example.com and phone is +1-555-1234" ); console.log(response.text); // "My email is and phone is " console.log(response.mapping); // {'': 'john@example.com', '': '+1-555-1234'} // Detokenize const original = await client.detokenize( "Contact ", response.mapping ); console.log(original.text); // "Contact john@example.com" ``` ```java theme={null} import dev.blindfold.sdk.Blindfold; Blindfold client = new Blindfold("your-api-key"); // Tokenize var response = client.tokenize( "My email is john@example.com and phone is +1-555-1234" ); System.out.println(response.getText()); // "My email is and phone is " System.out.println(response.getMapping()); // {=john@example.com, =+1-555-1234} // Detokenize var original = client.detokenize( "Contact ", response.getMapping() ); System.out.println(original.getText()); // "Contact john@example.com" ``` ```bash theme={null} # Tokenize curl -X POST https://api.blindfold.dev/api/public/v1/tokenize \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "text": "My email is john@example.com and phone is +1-555-1234" }' # Response includes mapping for detokenization { "text": "My email is and phone is ", "mapping": { "": "john@example.com", "": "+1-555-1234" } } # Detokenize curl -X POST https://api.blindfold.dev/api/public/v1/detokenize \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "text": "Contact ", "mapping": { "": "john@example.com" } }' ``` ## Configuration Options ### Filter Specific Entity Types Only detect and tokenize specific types of sensitive data: ```python theme={null} response = client.tokenize( "John Doe lives at 123 Main St, email: john@example.com", config={ "entities": ["EMAIL_ADDRESS"] # Only tokenize emails } ) # Output: "John Doe lives at 123 Main St, email: " ``` ### Adjust Confidence Threshold Control detection sensitivity (0.0 - 1.0): ```python theme={null} response = client.tokenize( text="Maybe email: test@test", config={ "score_threshold": 0.8 # Only high-confidence detections } ) ``` * **Lower threshold (0.3)**: More detections, may include false positives * **Higher threshold (0.8)**: Fewer detections, only very confident matches ## Security Best Practices ### 1. Store Mappings Securely Treat mappings like passwords - store them encrypted: ```python theme={null} # Store mapping in encrypted session session['token_mapping'] = encrypt(protected.mapping) # Later, decrypt and detokenize mapping = decrypt(session['token_mapping']) final = client.detokenize(text, mapping) ``` ### 2. Implement Mapping TTL Don't store mappings forever: ```python theme={null} # Set expiration on mapping storage redis.setex( f"mapping:{session_id}", 3600, # 1 hour TTL json.dumps(protected.mapping) ) ``` ### 3. Clear Mappings After Use Delete mappings when no longer needed: ```python theme={null} # Process and clean up protected = client.tokenize(user_input) ai_response = process_with_ai(protected.text) final = client.detokenize(ai_response, protected.mapping) # Clear the mapping del protected.mapping # or delete from storage ``` ## Common Use Cases Protect user conversations with AI models: ```python theme={null} # 1. Tokenize user input protected = client.tokenize(user_message) # 2. Send to AI (protected) ai_response = openai.chat(protected.text) # 3. Restore original data final = client.detokenize(ai_response, protected.mapping) ``` **Benefits**: No PII reaches AI provider, full compliance maintained Share data with vendors without exposing PII: ```python theme={null} # Tokenize before sending to vendor protected = client.tokenize(customer_data) vendor_api.process(protected.text) # Restore results from vendor results = vendor_api.get_results() final = client.detokenize(results, protected.mapping) ``` **Benefits**: Vendors never see real PII, easier compliance Use production-like data safely in dev: ```python theme={null} # Tokenize production data protected = client.tokenize(prod_customer_records) # Load into dev database dev_db.insert(protected.text) # Developers work with realistic but safe data ``` **Benefits**: Realistic testing without PII exposure risk Log events without storing sensitive data: ```python theme={null} # Tokenize before logging protected = client.tokenize(event_details) # Log safely logger.info(f"User action: {protected.text}") # Store mapping separately if needed for investigation audit_store.save_mapping(event_id, protected.mapping) ``` **Benefits**: Logs are safe to store, can restore if needed ## Learn More Full Python SDK documentation Complete JavaScript guide Sync and async Java client HTTP API reference for /tokenize Practical integration examples ## Compare with Other Methods Not sure if tokenization is right for you? Compare with alternatives: Partial visibility (e.g., \*\*\*\*3456) Permanent removal Consistent identifiers for analytics AES encryption with key