What is Synthesis?
Synthesis is a privacy protection method that replaces real sensitive data with realistic fake data generated by Faker library. The fake data looks authentic but contains no real PII. Example:How It Works
- Detection: Blindfold identifies sensitive entities in your text
- Generation: For each entity, realistic fake data is generated based on type
- Replacement: Real data is replaced with synthetic data
- Language Support: Fake data matches the specified language locale
When to Use Synthesis
Synthesis is ideal when you need to:1. Generate Test Data
Create realistic test data for development and testing environments.- Realistic test data without PII
- Repeatable test scenarios
- No risk of exposing real user data
2. Demo Environments
Populate demo environments with realistic but fake data.- Product demos
- Sales presentations
- Training environments
- Screenshots and marketing
3. Realistic Training Data
Create training datasets that look real but contain no actual PII.4. Data Sharing for Testing
Share realistic data with partners or vendors for integration testing.When NOT to Use Synthesis
Synthesis is not suitable when:1. You Need Original Data Back
Synthesis is irreversible. Use Tokenization instead.2. Users Need to Recognize Their Data
Users won’t recognize synthesized data. Use Masking.3. You Need Consistent Identifiers
Each synthesis generates different data. Use Hashing.Key Features
Realistic Data
Generated data looks authentic
Multi-Language
Supports 8 languages with locale-specific data
Type-Aware
Generates appropriate data for each entity type
Powered by Faker
Uses Faker library for quality fake data
Quick Start
- Python
- JavaScript
- cURL
Supported Languages
Generate locale-specific fake data for different languages:- English
- Czech
- German
- French
- Spanish
- Italian
- Polish
- Slovak
en- English (US)cs- Czechde- Germanfr- Frenches- Spanishit- Italianpl- Polishsk- Slovak
Entity Types and Fake Data
Different entity types generate different kinds of fake data:| Entity Type | Example Input | Example Output |
|---|---|---|
PERSON | John Doe | Michael Smith |
EMAIL_ADDRESS | [email protected] | [email protected] |
PHONE_NUMBER | +1-555-1234 | +1-555-9876 |
LOCATION | New York | Boston |
ORGANIZATION | Microsoft | TechCorp |
CREDIT_CARD | 4532-7562-9102-3456 | 5678-1234-9012-3456 |
DATE_TIME | 2024-01-15 | 2023-11-22 |
IP_ADDRESS | 192.168.1.1 | 10.0.0.5 |
URL | https://example.com | https://test-site.org |
Common Patterns
Generate Test Users
Populate Demo Database
Create Training Dataset
Common Use Cases
Automated Testing
Automated Testing
Generate test data for automated test suites:Benefits: Fresh test data each run, no PII in test environments
Demo Environments
Demo Environments
Create realistic demo data:Benefits: Realistic demos without real customer data
Load Testing
Load Testing
Generate data for performance testing:Benefits: Large-scale test data without PII concerns
Screenshots & Marketing
Screenshots & Marketing
Create safe data for screenshots and marketing materials:Benefits: No privacy risks in public materials
Best Practices
1. Use Templates
Create templates for consistent synthetic data:2. Locale-Specific Data
Use appropriate language for your audience:3. Document Synthetic Data Use
Clearly mark synthetic data in your systems:4. Combine with Other Methods
Use synthesis alongside other privacy methods:Learn More
Python SDK
Full Python SDK documentation
JavaScript SDK
Complete JavaScript guide
REST API
HTTP API reference for /synthesize
Examples
Practical integration examples