What is Tokenization?
Tokenization is a reversible privacy protection method that replaces sensitive data with placeholder tokens (e.g.,<PERSON_1>, <EMAIL_ADDRESS_1>). The original values are stored in a mapping that allows you to restore the data later.
Example:
How It Works
- Detection: Blindfold’s AI engine scans your text and identifies sensitive entities (names, emails, phone numbers, etc.)
- Replacement: Each detected entity is replaced with a unique token based on its type
- Mapping: A mapping dictionary is created to link tokens back to original values
- Detokenization: Later, you can use the mapping to restore the original data
When to Use Tokenization
Tokenization is ideal when you need to:1. Protect Data Sent to AI Models
Send user data to OpenAI, Anthropic, or other LLMs without exposing sensitive information.- AI providers log conversations
- Prevents PII from being stored in third-party systems
- Maintains compliance with privacy regulations
2. Temporary Data Anonymization
Anonymize data for processing, then restore it afterward.3. Data Sharing with External Partners
Share data with partners or contractors without exposing real PII.4. Development and Testing
Use tokenized production data in development environments.When NOT to Use Tokenization
Tokenization is not suitable when:1. You Don’t Need to Restore Data
If you never need the original values, use Redaction or Hashing instead.2. You Need Partial Visibility
If users need to see part of the data (like last 4 digits of a card), use Masking.3. You Need Consistent Identifiers
For analytics or tracking, use Hashing to get deterministic identifiers.Key Features
Reversible
Restore original data anytime using the mapping
Type-Aware
Different tokens for different entity types (PERSON, EMAIL, etc.)
Consistent Within Text
Same value gets same token within one request
50+ Entity Types
Automatically detects names, emails, SSNs, cards, and more
Token Format
Tokens follow a predictable format:<ENTITY_TYPE_N>
<PERSON_1>,<PERSON_2>- Person names<EMAIL_ADDRESS_1>,<EMAIL_ADDRESS_2>- Email addresses<PHONE_NUMBER_1>- Phone numbers<CREDIT_CARD_1>- Credit card numbers<US_SSN_1>- Social Security Numbers- And 50+ more types…
Quick Start
- Python
- JavaScript
- cURL
Configuration Options
Filter Specific Entity Types
Only detect and tokenize specific types of sensitive data:Adjust Confidence Threshold
Control detection sensitivity (0.0 - 1.0):- Lower threshold (0.3): More detections, may include false positives
- Higher threshold (0.8): Fewer detections, only very confident matches
Security Best Practices
1. Store Mappings Securely
Treat mappings like passwords - store them encrypted:2. Implement Mapping TTL
Don’t store mappings forever:3. Clear Mappings After Use
Delete mappings when no longer needed:Common Use Cases
AI Chatbot Integration
AI Chatbot Integration
Protect user conversations with AI models:Benefits: No PII reaches AI provider, full compliance maintained
Third-Party Data Processing
Third-Party Data Processing
Share data with vendors without exposing PII:Benefits: Vendors never see real PII, easier compliance
Development Environments
Development Environments
Use production-like data safely in dev:Benefits: Realistic testing without PII exposure risk
Audit Logging
Audit Logging
Log events without storing sensitive data:Benefits: Logs are safe to store, can restore if needed
Learn More
Python SDK
Full Python SDK documentation
JavaScript SDK
Complete JavaScript guide
REST API
HTTP API reference for /tokenize
Examples
Practical integration examples