> ## Documentation Index
> Fetch the complete documentation index at: https://docs.blindfold.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Synthesis

> Replace real data with realistic fake data

## What is Synthesis?

Synthesis is a privacy protection method that replaces real sensitive data with realistic fake data generated by Faker library. The fake data looks authentic but contains no real PII.

**Example:**

```
Input:  "John Doe lives in New York and works at Microsoft"
Output: "Michael Smith lives in Boston and works at TechCorp"
```

## How It Works

1. **Detection**: Blindfold identifies sensitive entities in your text
2. **Generation**: For each entity, realistic fake data is generated based on type
3. **Replacement**: Real data is replaced with synthetic data
4. **Language Support**: Fake data matches the specified language locale

## When to Use Synthesis

Synthesis is ideal when you need to:

### 1. Generate Test Data

Create realistic test data for development and testing environments.

```python theme={null}
# Generate test user profiles
template = "Name: John Doe, Email: john@example.com, Phone: +1-555-1234"

for i in range(10):
    result = client.synthesize(template, language="en")
    print(result.text)

# Output (examples):
# "Name: Michael Smith, Email: michael@example.net, Phone: +1-555-9876"
# "Name: Sarah Johnson, Email: sarah@example.org, Phone: +1-555-4567"
# ... 10 unique profiles
```

**Why this matters:**

* Realistic test data without PII
* Repeatable test scenarios
* No risk of exposing real user data

### 2. Demo Environments

Populate demo environments with realistic but fake data.

```python theme={null}
# Create demo customer data
customer_template = """
Customer: Jane Smith
Email: jane@company.com
Location: New York
Company: TechCorp
"""

demo_customer = client.synthesize(customer_template, language="en")
load_into_demo_db(demo_customer.text)
```

**Use cases:**

* Product demos
* Sales presentations
* Training environments
* Screenshots and marketing

### 3. Realistic Training Data

Create training datasets that look real but contain no actual PII.

```python theme={null}
# Generate training data for ML models
training_examples = []

for _ in range(1000):
    synthetic = client.synthesize(
        "Patient John Doe, age 45, diagnosed with diabetes",
        language="en"
    )
    training_examples.append(synthetic.text)

# Train model on synthetic data
```

### 4. Data Sharing for Testing

Share realistic data with partners or vendors for integration testing.

```python theme={null}
# Create synthetic data for vendor testing
test_data = client.synthesize(
    production_data_sample,
    language="en"
)

# Safe to share - no real PII
send_to_vendor(test_data.text)
```

## When NOT to Use Synthesis

Synthesis is **not suitable** when:

### 1. You Need Original Data Back

Synthesis is irreversible. Use **Tokenization** instead.

```python theme={null}
# Bad - can't restore
synthetic = client.synthesize("john@example.com")
# No way to get "john@example.com" back

# Good - use tokenization
protected = client.tokenize("john@example.com")
original = client.detokenize(protected.text, protected.mapping)
```

### 2. Users Need to Recognize Their Data

Users won't recognize synthesized data. Use **Masking**.

```python theme={null}
# Bad - user won't recognize their card
synthetic = client.synthesize("Card: 4532-7562-9102-3456")
# Output: "Card: 5678-1234-9012-3456" (completely different)

# Good - show last 4 of real card
masked = client.mask("Card: 4532-7562-9102-3456")
# Output: "Card: ***************3456"
```

### 3. You Need Consistent Identifiers

Each synthesis generates different data. Use **Hashing**.

```python theme={null}
# Bad - different each time
synth1 = client.synthesize("john@example.com")  # michael@example.com
synth2 = client.synthesize("john@example.com")  # sarah@example.org

# Good - same hash every time
hash1 = client.hash("john@example.com")  # ID_a3f8b9...
hash2 = client.hash("john@example.com")  # ID_a3f8b9... (same)
```

## Key Features

<CardGroup cols={2}>
  <Card title="Realistic Data" icon="wand-magic-sparkles">
    Generated data looks authentic
  </Card>

  <Card title="Multi-Language" icon="globe">
    Supports 8 languages with locale-specific data
  </Card>

  <Card title="Type-Aware" icon="brain">
    Generates appropriate data for each entity type
  </Card>

  <Card title="Powered by Faker" icon="robot">
    Uses Faker library for quality fake data
  </Card>
</CardGroup>

## Quick Start

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from blindfold import Blindfold

    client = Blindfold(api_key="your-api-key")

    # Basic synthesis
    result = client.synthesize(
        text="John Doe lives in New York and works at Microsoft",
        language="en"
    )

    print(result.text)
    # "Michael Smith lives in Boston and works at TechCorp"
    # (example output - will vary)

    # Generate multiple variations
    template = "Customer: Jane Doe, Email: jane@example.com"

    for i in range(3):
        result = client.synthesize(template, language="en")
        print(f"{i+1}. {result.text}")

    # Output (examples):
    # 1. Customer: Sarah Johnson, Email: sarah@example.org
    # 2. Customer: Michael Brown, Email: michael@example.net
    # 3. Customer: Emily Davis, Email: emily@example.com
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { Blindfold } from '@blindfold/sdk';

    const client = new Blindfold({ apiKey: 'your-api-key' });

    // Basic synthesis
    const result = await client.synthesize(
      "John Doe lives in New York and works at Microsoft",
      { language: 'en' }
    );

    console.log(result.text);
    // "Michael Smith lives in Boston and works at TechCorp"
    // (example output - will vary)

    // Generate multiple variations
    const template = "Customer: Jane Doe, Email: jane@example.com";

    for (let i = 0; i < 3; i++) {
      const result = await client.synthesize(template, { language: 'en' });
      console.log(`${i+1}. ${result.text}`);
    }

    // Output (examples):
    // 1. Customer: Sarah Johnson, Email: sarah@example.org
    // 2. Customer: Michael Brown, Email: michael@example.net
    // 3. Customer: Emily Davis, Email: emily@example.com
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    import dev.blindfold.sdk.Blindfold;

    Blindfold client = new Blindfold("your-api-key");

    // Basic synthesis
    var result = client.synthesize(
        "John Doe lives in New York and works at Microsoft",
        "en", null
    );

    System.out.println(result.getText());
    // "Michael Smith lives in Boston and works at TechCorp"
    // (example output - will vary)

    // Generate multiple variations
    String template = "Customer: Jane Doe, Email: jane@example.com";

    for (int i = 0; i < 3; i++) {
        var r = client.synthesize(template, "en", null);
        System.out.println((i + 1) + ". " + r.getText());
    }
    // 1. Customer: Sarah Johnson, Email: sarah@example.org
    // 2. Customer: Michael Brown, Email: michael@example.net
    // 3. Customer: Emily Davis, Email: emily@example.com
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.blindfold.dev/api/public/v1/synthesize \
      -H "X-API-Key: your-api-key" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "John Doe lives in New York and works at Microsoft",
        "language": "en"
      }'

    # Response (example - will vary)
    {
      "text": "Michael Smith lives in Boston and works at TechCorp",
      "entities_count": 3,
      "detected_entities": [
        {
          "type": "PERSON",
          "text": "John Doe",
          "score": 0.95
        },
        {
          "type": "LOCATION",
          "text": "New York",
          "score": 0.90
        },
        {
          "type": "ORGANIZATION",
          "text": "Microsoft",
          "score": 0.85
        }
      ]
    }
    ```
  </Tab>
</Tabs>

## Supported Languages

Generate locale-specific fake data for different languages:

<Tabs>
  <Tab title="English">
    ```python theme={null}
    result = client.synthesize(
        "John Doe from New York",
        language="en"
    )
    # "Michael Smith from Boston"
    ```
  </Tab>

  <Tab title="Czech">
    ```python theme={null}
    result = client.synthesize(
        "Jan Novák z Prahy",
        language="cs"
    )
    # "Petr Dvořák z Brna"
    ```
  </Tab>

  <Tab title="German">
    ```python theme={null}
    result = client.synthesize(
        "Hans Müller aus Berlin",
        language="de"
    )
    # "Klaus Schmidt aus München"
    ```
  </Tab>

  <Tab title="French">
    ```python theme={null}
    result = client.synthesize(
        "Marie Dupont de Paris",
        language="fr"
    )
    # "Sophie Martin de Lyon"
    ```
  </Tab>

  <Tab title="Spanish">
    ```python theme={null}
    result = client.synthesize(
        "Juan García de Madrid",
        language="es"
    )
    # "Carlos López de Barcelona"
    ```
  </Tab>

  <Tab title="Italian">
    ```python theme={null}
    result = client.synthesize(
        "Marco Rossi da Roma",
        language="it"
    )
    # "Giuseppe Bianchi da Milano"
    ```
  </Tab>

  <Tab title="Polish">
    ```python theme={null}
    result = client.synthesize(
        "Jan Kowalski z Warszawy",
        language="pl"
    )
    # "Piotr Nowak z Krakowa"
    ```
  </Tab>

  <Tab title="Slovak">
    ```python theme={null}
    result = client.synthesize(
        "Ján Kováč z Bratislavy",
        language="sk"
    )
    # "Peter Horváth z Košíc"
    ```
  </Tab>
</Tabs>

**Supported Languages:**

* `en` - English (US)
* `cs` - Czech
* `de` - German
* `fr` - French
* `es` - Spanish
* `it` - Italian
* `pl` - Polish
* `sk` - Slovak

## Entity Types and Fake Data

Different entity types generate different kinds of fake data:

| Entity Type     | Example Input                               | Example Output                                    |
| --------------- | ------------------------------------------- | ------------------------------------------------- |
| `PERSON`        | John Doe                                    | Michael Smith                                     |
| `EMAIL_ADDRESS` | [john@example.com](mailto:john@example.com) | [michael@example.net](mailto:michael@example.net) |
| `PHONE_NUMBER`  | +1-555-1234                                 | +1-555-9876                                       |
| `LOCATION`      | New York                                    | Boston                                            |
| `ORGANIZATION`  | Microsoft                                   | TechCorp                                          |
| `CREDIT_CARD`   | 4532-7562-9102-3456                         | 5678-1234-9012-3456                               |
| `DATE_TIME`     | 2024-01-15                                  | 2023-11-22                                        |
| `IP_ADDRESS`    | 192.168.1.1                                 | 10.0.0.5                                          |
| `URL`           | [https://example.com](https://example.com)  | [https://test-site.org](https://test-site.org)    |

## Common Patterns

### Generate Test Users

```python theme={null}
def generate_test_users(count: int) -> list:
    """Generate realistic test user profiles"""

    template = """
    Name: John Doe
    Email: john.doe@company.com
    Phone: +1-555-1234
    Location: New York
    """

    users = []
    for _ in range(count):
        result = client.synthesize(template, language="en")
        users.append(result.text)

    return users

# Usage
test_users = generate_test_users(100)
# 100 unique, realistic user profiles
```

### Populate Demo Database

```python theme={null}
def populate_demo_db(template_data: list):
    """Fill demo database with synthetic data"""

    for template in template_data:
        synthetic = client.synthesize(template, language="en")

        # Parse and insert
        demo_db.insert(parse_profile(synthetic.text))

# Usage
templates = load_production_templates()
populate_demo_db(templates)
```

### Create Training Dataset

```python theme={null}
def create_training_data(examples: list, count: int):
    """Generate training data from examples"""

    training_set = []

    for example in examples:
        for _ in range(count):
            synthetic = client.synthesize(example, language="en")
            training_set.append(synthetic.text)

    return training_set

# Usage
examples = ["Patient John Doe diagnosed with condition X", ...]
training_data = create_training_data(examples, 100)
# 100 synthetic examples per template
```

## Common Use Cases

<AccordionGroup>
  <Accordion title="Automated Testing" icon="vial">
    Generate test data for automated test suites:

    ```python theme={null}
    def test_user_registration():
        # Generate unique test user
        test_user = client.synthesize(
            "Name: John Doe, Email: john@test.com",
            language="en"
        ).text

        # Use in test
        response = api.register_user(test_user)
        assert response.status_code == 200
    ```

    **Benefits**: Fresh test data each run, no PII in test environments
  </Accordion>

  <Accordion title="Demo Environments" icon="display">
    Create realistic demo data:

    ```python theme={null}
    # Generate demo customers
    def setup_demo_environment():
        templates = [
            "Enterprise customer: Company X, contact: john@x.com",
            "Small business: Company Y, contact: jane@y.com"
        ]

        for template in templates:
            synthetic = client.synthesize(template)
            create_demo_account(synthetic.text)
    ```

    **Benefits**: Realistic demos without real customer data
  </Accordion>

  <Accordion title="Load Testing" icon="gauge">
    Generate data for performance testing:

    ```python theme={null}
    def load_test_data_generator(count: int):
        """Generate data for load testing"""
        template = "User: john@example.com, Session: abc123"

        test_data = []
        for _ in range(count):
            synthetic = client.synthesize(template)
            test_data.append(synthetic.text)

        return test_data

    # Generate 10,000 test records
    load_data = load_test_data_generator(10000)
    ```

    **Benefits**: Large-scale test data without PII concerns
  </Accordion>

  <Accordion title="Screenshots & Marketing" icon="camera">
    Create safe data for screenshots and marketing materials:

    ```python theme={null}
    def prepare_screenshot_data():
        """Generate data for product screenshots"""
        user_data = client.synthesize(
            "User: Jane Doe, Email: jane@company.com",
            language="en"
        )

        # Use in screenshot - safe for public release
        return user_data.text
    ```

    **Benefits**: No privacy risks in public materials
  </Accordion>
</AccordionGroup>

## Best Practices

### 1. Use Templates

Create templates for consistent synthetic data:

```python theme={null}
# Define templates
TEMPLATES = {
    'user': "Name: {name}, Email: {email}, Phone: {phone}",
    'company': "Company: {company}, Location: {location}"
}

# Generate from templates
def generate_user():
    return client.synthesize(TEMPLATES['user'], language="en")
```

### 2. Locale-Specific Data

Use appropriate language for your audience:

```python theme={null}
# European demo environment
if region == "EU":
    # Generate German data
    demo_data = client.synthesize(template, language="de")
elif region == "US":
    # Generate US data
    demo_data = client.synthesize(template, language="en")
```

### 3. Document Synthetic Data Use

Clearly mark synthetic data in your systems:

```python theme={null}
synthetic_user = {
    'name': result.text,
    'is_synthetic': True,  # Mark as synthetic
    'generated_at': datetime.now()
}
```

### 4. Combine with Other Methods

Use synthesis alongside other privacy methods:

```python theme={null}
# Synthesis for testing
test_data = client.synthesize(template)

# Tokenization for production
prod_data = client.tokenize(real_user_input)
```

## Learn More

<CardGroup cols={2}>
  <Card title="Python SDK" icon="python" href="/sdks/python-sdk">
    Full Python SDK documentation
  </Card>

  <Card title="JavaScript SDK" icon="js" href="/sdks/javascript-sdk">
    Complete JavaScript guide
  </Card>

  <Card title="Java SDK" icon="java" href="/sdks/java-sdk">
    Sync and async Java client
  </Card>

  <Card title="REST API" icon="terminal" href="/api-reference/rest-api">
    HTTP API reference for /synthesize
  </Card>

  <Card title="Examples" icon="code" href="/examples">
    Practical integration examples
  </Card>
</CardGroup>

## Compare with Other Methods

<CardGroup cols={2}>
  <Card title="Tokenization" icon="shuffle" href="/methods/tokenization">
    Reversible replacement (restore later)
  </Card>

  <Card title="Masking" icon="eye-slash" href="/methods/masking">
    Partial visibility for users
  </Card>

  <Card title="Redaction" icon="eraser" href="/methods/redaction">
    Complete permanent removal
  </Card>

  <Card title="Hashing" icon="hashtag" href="/methods/hashing">
    Consistent identifiers for analytics
  </Card>
</CardGroup>
