Skip to main content

What is Redaction?

Redaction is a permanent privacy protection method that completely removes sensitive data from text. The detected sensitive information is deleted and cannot be restored. Example:
Input:  "My name is John Doe and SSN is 123-45-6789"
Output: "My name is  and SSN is "

How It Works

  1. Detection: Blindfold identifies sensitive entities in your text
  2. Complete Removal: Each detected entity is completely removed from the text
  3. Permanent: Original values are discarded and cannot be recovered
  4. Clean Output: Text flows naturally with sensitive data removed

When to Use Redaction

Redaction is ideal when you need to:

1. Permanent Data Anonymization

Remove PII from logs, support tickets, or archives that will be stored long-term.
# Redact support ticket before archiving
ticket = "Customer John Doe (john@example.com) reported an issue with order #12345"
redacted = client.redact(ticket)

# Store safely
archive_db.save(redacted.text)
# "Customer  () reported an issue with order #12345"
Why this matters:
  • Compliant long-term storage
  • No risk of data breach exposing PII
  • Meets “right to be forgotten” requirements

2. Third-Party Analytics

Share data with analytics platforms without exposing sensitive information.
# Redact before sending to analytics
event = "User john.doe@company.com completed purchase"
redacted = client.redact(event)

# Send to analytics
analytics.track(redacted.text)
# "User  completed purchase"
Use cases:
  • Google Analytics
  • Mixpanel, Amplitude
  • Custom analytics platforms
  • Business intelligence tools

3. Public Disclosure

Prepare data for public release or legal disclosure.
# Redact before publishing
document = """
Incident involving John Smith (SSN: 123-45-6789)
Contact: john@example.com, Phone: +1-555-1234
"""

redacted = client.redact(document)
# All PII removed, safe for public release

4. Log Sanitization

Remove sensitive data from application logs.
# Redact logs before storage
log_entry = "User login: john@example.com from IP 192.168.1.100"
redacted = client.redact(log_entry)

logger.info(redacted.text)
# "User login: <EMAIL_ADDRESS> from IP <IP_ADDRESS>"

5. GDPR Compliance

Implement “right to be forgotten” by permanently removing user data.
# User requests data deletion
user_records = fetch_user_records(user_id)

# Redact instead of delete (keeps records for analysis)
for record in user_records:
    redacted = client.redact(record)
    update_record(record.id, redacted.text)

When NOT to Use Redaction

Redaction is not suitable when:

1. You Need to Restore Data Later

Redaction is permanent. Use Tokenization instead.
# Bad - can't restore
redacted = client.redact("Contact john@example.com")
# No way to get "john@example.com" back

# Good - use tokenization
protected = client.tokenize("Contact john@example.com")
original = client.detokenize(protected.text, protected.mapping)

2. Users Need to Identify the Data

If users need to recognize their own data, use Masking.
# Bad - user can't identify their card
redacted = client.redact("Card: 4532-7562-9102-3456")
# Output: "Card: "

# Good - show last 4 digits
masked = client.mask("Card: 4532-7562-9102-3456")
# Output: "Card: ***************3456"

3. You Need Consistent Identifiers

For analytics with user tracking, use Hashing.
# Bad - can't track same user across events
redacted1 = client.redact("User: john@example.com")  # "User: "
redacted2 = client.redact("User: jane@example.com")  # "User: "
# Both look the same, can't distinguish users

# Good - same user gets same hash
hash1 = client.hash("User: john@example.com")  # ID_a3f8b9...
hash2 = client.hash("User: john@example.com")  # ID_a3f8b9... (same)

Key Features

Permanent Removal

Data is completely removed and cannot be recovered

Complete Deletion

Sensitive text is deleted, not replaced

GDPR Compliant

Meets data minimization requirements

50+ Entity Types

Removes all detected PII types

Quick Start

from blindfold import Blindfold

client = Blindfold(api_key="your-api-key")

# Basic redaction
result = client.redact(
    "Contact John Doe at john@example.com or call +1-555-1234"
)

print(result.text)
# "Contact  at  or call "

print(f"Redacted {result.entities_count} entities")
# "Redacted 3 entities"

# Check what was redacted
for entity in result.detected_entities:
    print(f"- {entity.type}: {entity.text} (removed)")
# - PERSON: John Doe (removed)
# - EMAIL_ADDRESS: john@example.com (removed)
# - PHONE_NUMBER: +1-555-1234 (removed)

Configuration Options

Filter Specific Entity Types

Only redact specific types of sensitive data:
# Only redact SSNs and credit cards
result = client.redact(
    "John Doe (SSN: 123-45-6789) paid with card 4532-7562-9102-3456",
    entities=["US_SSN", "CREDIT_CARD"]
)
# Output: "John Doe (SSN: ) paid with card "
# Name is NOT redacted

Adjust Confidence Threshold

Control detection sensitivity:
# Only high-confidence redactions
result = client.redact(
    text="Maybe email: test@test",
    score_threshold=0.8  # High confidence only
)
# Low-confidence detections are skipped

Common Patterns

Log Sanitization

Automatically redact logs before storage:
def safe_log(message: str, level: str = "info"):
    """Log messages with automatic PII redaction"""
    redacted = client.redact(message)

    if level == "info":
        logger.info(redacted.text)
    elif level == "error":
        logger.error(redacted.text)

# Usage
safe_log("User john@example.com failed to login from 192.168.1.100")
# Logs: "User  failed to login from "

Support Ticket Archival

Redact tickets before long-term storage:
def archive_ticket(ticket_data: dict):
    """Archive support ticket with redacted PII"""

    # Redact sensitive fields
    ticket_data['description'] = client.redact(
        ticket_data['description']
    ).text

    ticket_data['customer_notes'] = client.redact(
        ticket_data['customer_notes']
    ).text

    # Store safely
    archive_db.insert(ticket_data)

# Usage
ticket = {
    'id': 12345,
    'description': 'Customer John Doe (john@example.com) needs help',
    'customer_notes': 'My SSN is 123-45-6789'
}

archive_ticket(ticket)
# All PII removed before storage

Analytics Event Tracking

Send events to analytics without PII:
def track_event(event_name: str, properties: dict):
    """Track analytics event with redacted PII"""

    # Redact all string properties
    safe_properties = {}
    for key, value in properties.items():
        if isinstance(value, str):
            safe_properties[key] = client.redact(value).text
        else:
            safe_properties[key] = value

    # Send to analytics
    analytics.track(event_name, safe_properties)

# Usage
track_event("user_signup", {
    "email": "john@example.com",
    "source": "landing_page",
    "age": 25
})
# Analytics receives: email="<EMAIL_ADDRESS>", source="landing_page", age=25

Common Use Cases

Maintain audit logs without storing PII:
# Log user actions without PII
def log_user_action(user_email, action):
    redacted = client.redact(f"{user_email} performed {action}")
    compliance_log.write(redacted.text)

log_user_action("john@example.com", "password_reset")
# Logs: " performed password_reset"
Benefits: Audit trail maintained, no PII storage, GDPR compliant
Collect feedback without storing customer PII:
# Redact customer feedback before storage
def save_feedback(feedback_text, rating):
    redacted = client.redact(feedback_text)

    feedback_db.insert({
        'text': redacted.text,
        'rating': rating,
        'date': datetime.now()
    })

save_feedback(
    "Great service! Contact me at john@example.com",
    5
)
# Stores: "Great service! Contact me at "
Benefits: Feedback preserved, PII removed, safe for analysis
Share error reports without exposing user data:
# Redact error reports before sending to bug tracker
def report_error(error_message, user_context):
    redacted_message = client.redact(error_message)
    redacted_context = client.redact(user_context)

    bug_tracker.create_issue({
        'title': redacted_message.text,
        'description': redacted_context.text
    })

report_error(
    "Database error for user john@example.com",
    "User IP: 192.168.1.100, Session: abc123"
)
# Bug report contains no real PII
Benefits: Developers get context, user privacy protected
Create shareable datasets from sensitive data:
# Prepare dataset for public release
def create_public_dataset(private_records):
    public_records = []

    for record in private_records:
        redacted = client.redact(record)
        public_records.append(redacted.text)

    return public_records

# Original: ["John Doe, john@example.com, +1-555-1234", ...]
# Public: [", , ", ...]
Benefits: Data useful for research, no privacy violations

Best Practices

1. Redact Early

Redact sensitive data as early as possible in your pipeline:
# Good - redact immediately
user_input = request.get_json()['message']
safe_message = client.redact(user_input).text
process_message(safe_message)

# Bad - redact late (PII may leak in logs, errors, etc.)
user_input = request.get_json()['message']
process_message(user_input)  # PII exposed during processing
redacted = client.redact(result)

2. Log What Was Redacted

Keep audit trails of redaction events:
result = client.redact(text)

# Log redaction metadata
audit_log.info({
    'action': 'redaction',
    'entities_redacted': result.entities_count,
    'entity_types': [e.type for e in result.detected_entities],
    'timestamp': datetime.now()
})

3. Review Redaction Policies

Regularly review what gets redacted:
# Monitor redaction statistics
def analyze_redactions(timeframe):
    stats = {
        'total_redactions': 0,
        'entity_types': {}
    }

    for event in get_redaction_events(timeframe):
        stats['total_redactions'] += event.entities_count
        for entity in event.detected_entities:
            stats['entity_types'][entity.type] = \
                stats['entity_types'].get(entity.type, 0) + 1

    return stats

4. Combine with Other Methods

Use redaction alongside other privacy methods:
# Redact for long-term storage, tokenize for processing
def process_and_store(data):
    # Tokenize for processing
    protected = client.tokenize(data)
    result = process_with_ai(protected.text)

    # Redact for storage
    redacted = client.redact(result)
    database.save(redacted.text)

Security Considerations

Important redaction considerations:
  • Permanent: Redacted data cannot be recovered
  • Complete removal: Text is completely deleted, leaving gaps
  • Context flow: May affect readability with removed text
  • Not reversible: Unlike encryption, redaction cannot be undone
  • Review before production: Test redaction on sample data first

Learn More

Python SDK

Full Python SDK documentation

JavaScript SDK

Complete JavaScript guide

Java SDK

Sync and async Java client

REST API

HTTP API reference for /redact

Examples

Practical integration examples

Compare with Other Methods

Tokenization

Reversible replacement (restore later)

Masking

Partial visibility for users

Hashing

Consistent identifiers for tracking

Synthesis

Replace with fake realistic data