Synthesis - Blindfold

What is Synthesis?

Synthesis is a privacy protection method that replaces real sensitive data with realistic fake data generated by Faker library. The fake data looks authentic but contains no real PII. Example:

Input:  "John Doe lives in New York and works at Microsoft"
Output: "Michael Smith lives in Boston and works at TechCorp"

How It Works

Detection: Blindfold identifies sensitive entities in your text
Generation: For each entity, realistic fake data is generated based on type
Replacement: Real data is replaced with synthetic data
Language Support: Fake data matches the specified language locale

When to Use Synthesis

Synthesis is ideal when you need to:

1. Generate Test Data

Create realistic test data for development and testing environments.

# Generate test user profiles
template = "Name: John Doe, Email: john@example.com, Phone: +1-555-1234"

for i in range(10):
    result = client.synthesize(template, language="en")
    print(result.text)

# Output (examples):
# "Name: Michael Smith, Email: michael@example.net, Phone: +1-555-9876"
# "Name: Sarah Johnson, Email: sarah@example.org, Phone: +1-555-4567"
# ... 10 unique profiles

Why this matters:

Realistic test data without PII
Repeatable test scenarios
No risk of exposing real user data

2. Demo Environments

Populate demo environments with realistic but fake data.

# Create demo customer data
customer_template = """
Customer: Jane Smith
Email: jane@company.com
Location: New York
Company: TechCorp
"""

demo_customer = client.synthesize(customer_template, language="en")
load_into_demo_db(demo_customer.text)

Use cases:

Product demos
Sales presentations
Training environments
Screenshots and marketing

3. Realistic Training Data

Create training datasets that look real but contain no actual PII.

# Generate training data for ML models
training_examples = []

for _ in range(1000):
    synthetic = client.synthesize(
        "Patient John Doe, age 45, diagnosed with diabetes",
        language="en"
    )
    training_examples.append(synthetic.text)

# Train model on synthetic data

Share realistic data with partners or vendors for integration testing.

# Create synthetic data for vendor testing
test_data = client.synthesize(
    production_data_sample,
    language="en"
)

# Safe to share - no real PII
send_to_vendor(test_data.text)

When NOT to Use Synthesis

Synthesis is not suitable when:

1. You Need Original Data Back

Synthesis is irreversible. Use Tokenization instead.

# Bad - can't restore
synthetic = client.synthesize("john@example.com")
# No way to get "john@example.com" back

# Good - use tokenization
protected = client.tokenize("john@example.com")
original = client.detokenize(protected.text, protected.mapping)

2. Users Need to Recognize Their Data

Users won’t recognize synthesized data. Use Masking.

# Bad - user won't recognize their card
synthetic = client.synthesize("Card: 4532-7562-9102-3456")
# Output: "Card: 5678-1234-9012-3456" (completely different)

# Good - show last 4 of real card
masked = client.mask("Card: 4532-7562-9102-3456")
# Output: "Card: ***************3456"

3. You Need Consistent Identifiers

Each synthesis generates different data. Use Hashing.

# Bad - different each time
synth1 = client.synthesize("john@example.com")  # michael@example.com
synth2 = client.synthesize("john@example.com")  # sarah@example.org

# Good - same hash every time
hash1 = client.hash("john@example.com")  # ID_a3f8b9...
hash2 = client.hash("john@example.com")  # ID_a3f8b9... (same)

Key Features

Realistic Data

Generated data looks authentic

Multi-Language

Supports 8 languages with locale-specific data

Type-Aware

Generates appropriate data for each entity type

Powered by Faker

Uses Faker library for quality fake data

Quick Start

Python
JavaScript
Java
cURL

from blindfold import Blindfold

client = Blindfold(api_key="your-api-key")

# Basic synthesis
result = client.synthesize(
    text="John Doe lives in New York and works at Microsoft",
    language="en"
)

print(result.text)
# "Michael Smith lives in Boston and works at TechCorp"
# (example output - will vary)

# Generate multiple variations
template = "Customer: Jane Doe, Email: jane@example.com"

for i in range(3):
    result = client.synthesize(template, language="en")
    print(f"{i+1}. {result.text}")

# Output (examples):
# 1. Customer: Sarah Johnson, Email: sarah@example.org
# 2. Customer: Michael Brown, Email: michael@example.net
# 3. Customer: Emily Davis, Email: emily@example.com

import { Blindfold } from '@blindfold/sdk';

const client = new Blindfold({ apiKey: 'your-api-key' });

// Basic synthesis
const result = await client.synthesize(
  "John Doe lives in New York and works at Microsoft",
  { language: 'en' }
);

console.log(result.text);
// "Michael Smith lives in Boston and works at TechCorp"
// (example output - will vary)

// Generate multiple variations
const template = "Customer: Jane Doe, Email: jane@example.com";

for (let i = 0; i < 3; i++) {
  const result = await client.synthesize(template, { language: 'en' });
  console.log(`${i+1}. ${result.text}`);
}

// Output (examples):
// 1. Customer: Sarah Johnson, Email: sarah@example.org
// 2. Customer: Michael Brown, Email: michael@example.net
// 3. Customer: Emily Davis, Email: emily@example.com

import dev.blindfold.sdk.Blindfold;

Blindfold client = new Blindfold("your-api-key");

// Basic synthesis
var result = client.synthesize(
    "John Doe lives in New York and works at Microsoft",
    "en", null
);

System.out.println(result.getText());
// "Michael Smith lives in Boston and works at TechCorp"
// (example output - will vary)

// Generate multiple variations
String template = "Customer: Jane Doe, Email: jane@example.com";

for (int i = 0; i < 3; i++) {
    var r = client.synthesize(template, "en", null);
    System.out.println((i + 1) + ". " + r.getText());
}
// 1. Customer: Sarah Johnson, Email: sarah@example.org
// 2. Customer: Michael Brown, Email: michael@example.net
// 3. Customer: Emily Davis, Email: emily@example.com

curl -X POST https://api.blindfold.dev/api/public/v1/synthesize \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John Doe lives in New York and works at Microsoft",
    "language": "en"
  }'

# Response (example - will vary)
{
  "text": "Michael Smith lives in Boston and works at TechCorp",
  "entities_count": 3,
  "detected_entities": [
    {
      "type": "PERSON",
      "text": "John Doe",
      "score": 0.95
    },
    {
      "type": "LOCATION",
      "text": "New York",
      "score": 0.90
    },
    {
      "type": "ORGANIZATION",
      "text": "Microsoft",
      "score": 0.85
    }
  ]
}

Supported Languages

Generate locale-specific fake data for different languages:

result = client.synthesize(
    "John Doe from New York",
    language="en"
)
# "Michael Smith from Boston"

result = client.synthesize(
    "Jan Novák z Prahy",
    language="cs"
)
# "Petr Dvořák z Brna"

result = client.synthesize(
    "Hans Müller aus Berlin",
    language="de"
)
# "Klaus Schmidt aus München"

result = client.synthesize(
    "Marie Dupont de Paris",
    language="fr"
)
# "Sophie Martin de Lyon"

result = client.synthesize(
    "Juan García de Madrid",
    language="es"
)
# "Carlos López de Barcelona"

result = client.synthesize(
    "Marco Rossi da Roma",
    language="it"
)
# "Giuseppe Bianchi da Milano"

result = client.synthesize(
    "Jan Kowalski z Warszawy",
    language="pl"
)
# "Piotr Nowak z Krakowa"

result = client.synthesize(
    "Ján Kováč z Bratislavy",
    language="sk"
)
# "Peter Horváth z Košíc"

Supported Languages:

en - English (US)
cs - Czech
de - German
fr - French
es - Spanish
it - Italian
pl - Polish
sk - Slovak

Entity Types and Fake Data

Different entity types generate different kinds of fake data:

Entity Type	Example Input	Example Output
`PERSON`	John Doe	Michael Smith
`EMAIL_ADDRESS`	john@example.com	michael@example.net
`PHONE_NUMBER`	+1-555-1234	+1-555-9876
`LOCATION`	New York	Boston
`ORGANIZATION`	Microsoft	TechCorp
`CREDIT_CARD`	4532-7562-9102-3456	5678-1234-9012-3456
`DATE_TIME`	2024-01-15	2023-11-22
`IP_ADDRESS`	192.168.1.1	10.0.0.5
`URL`	https://example.com	https://test-site.org

Common Patterns

Generate Test Users

def generate_test_users(count: int) -> list:
    """Generate realistic test user profiles"""

    template = """
    Name: John Doe
    Email: john.doe@company.com
    Phone: +1-555-1234
    Location: New York
    """

    users = []
    for _ in range(count):
        result = client.synthesize(template, language="en")
        users.append(result.text)

    return users

# Usage
test_users = generate_test_users(100)
# 100 unique, realistic user profiles

Populate Demo Database

def populate_demo_db(template_data: list):
    """Fill demo database with synthetic data"""

    for template in template_data:
        synthetic = client.synthesize(template, language="en")

        # Parse and insert
        demo_db.insert(parse_profile(synthetic.text))

# Usage
templates = load_production_templates()
populate_demo_db(templates)

Create Training Dataset

def create_training_data(examples: list, count: int):
    """Generate training data from examples"""

    training_set = []

    for example in examples:
        for _ in range(count):
            synthetic = client.synthesize(example, language="en")
            training_set.append(synthetic.text)

    return training_set

# Usage
examples = ["Patient John Doe diagnosed with condition X", ...]
training_data = create_training_data(examples, 100)
# 100 synthetic examples per template

Common Use Cases

Automated Testing

Generate test data for automated test suites:

def test_user_registration():
    # Generate unique test user
    test_user = client.synthesize(
        "Name: John Doe, Email: john@test.com",
        language="en"
    ).text

    # Use in test
    response = api.register_user(test_user)
    assert response.status_code == 200

Benefits: Fresh test data each run, no PII in test environments

Demo Environments

Create realistic demo data:

# Generate demo customers
def setup_demo_environment():
    templates = [
        "Enterprise customer: Company X, contact: john@x.com",
        "Small business: Company Y, contact: jane@y.com"
    ]

    for template in templates:
        synthetic = client.synthesize(template)
        create_demo_account(synthetic.text)

Benefits: Realistic demos without real customer data

Load Testing

Generate data for performance testing:

def load_test_data_generator(count: int):
    """Generate data for load testing"""
    template = "User: john@example.com, Session: abc123"

    test_data = []
    for _ in range(count):
        synthetic = client.synthesize(template)
        test_data.append(synthetic.text)

    return test_data

# Generate 10,000 test records
load_data = load_test_data_generator(10000)

Benefits: Large-scale test data without PII concerns

Screenshots & Marketing

Create safe data for screenshots and marketing materials:

def prepare_screenshot_data():
    """Generate data for product screenshots"""
    user_data = client.synthesize(
        "User: Jane Doe, Email: jane@company.com",
        language="en"
    )

    # Use in screenshot - safe for public release
    return user_data.text

Benefits: No privacy risks in public materials

Best Practices

1. Use Templates

Create templates for consistent synthetic data:

# Define templates
TEMPLATES = {
    'user': "Name: {name}, Email: {email}, Phone: {phone}",
    'company': "Company: {company}, Location: {location}"
}

# Generate from templates
def generate_user():
    return client.synthesize(TEMPLATES['user'], language="en")

2. Locale-Specific Data

Use appropriate language for your audience:

# European demo environment
if region == "EU":
    # Generate German data
    demo_data = client.synthesize(template, language="de")
elif region == "US":
    # Generate US data
    demo_data = client.synthesize(template, language="en")

3. Document Synthetic Data Use

Clearly mark synthetic data in your systems:

synthetic_user = {
    'name': result.text,
    'is_synthetic': True,  # Mark as synthetic
    'generated_at': datetime.now()
}

4. Combine with Other Methods

Use synthesis alongside other privacy methods:

# Synthesis for testing
test_data = client.synthesize(template)

# Tokenization for production
prod_data = client.tokenize(real_user_input)

Learn More

Python SDK

Full Python SDK documentation

JavaScript SDK

Complete JavaScript guide

Java SDK

Sync and async Java client

REST API

HTTP API reference for /synthesize

Examples

Practical integration examples

Compare with Other Methods

Tokenization

Reversible replacement (restore later)

Masking

Partial visibility for users

Redaction

Complete permanent removal

Hashing

Consistent identifiers for analytics

​What is Synthesis?

​How It Works

​When to Use Synthesis

​1. Generate Test Data

​2. Demo Environments

​3. Realistic Training Data

​4. Data Sharing for Testing

​When NOT to Use Synthesis

​1. You Need Original Data Back

​2. Users Need to Recognize Their Data

​3. You Need Consistent Identifiers

​Key Features