Skip to main content

What is Redaction?

Redaction is a permanent privacy protection method that completely removes sensitive data from text. The detected sensitive information is deleted and cannot be restored. Example:
Input:  "My name is John Doe and SSN is 123-45-6789"
Output: "My name is  and SSN is "

How It Works

  1. Detection: Blindfold identifies sensitive entities in your text
  2. Complete Removal: Each detected entity is completely removed from the text
  3. Permanent: Original values are discarded and cannot be recovered
  4. Clean Output: Text flows naturally with sensitive data removed

When to Use Redaction

Redaction is ideal when you need to:

1. Permanent Data Anonymization

Remove PII from logs, support tickets, or archives that will be stored long-term.
# Redact support ticket before archiving
ticket = "Customer John Doe ([email protected]) reported an issue with order #12345"
redacted = client.redact(ticket)

# Store safely
archive_db.save(redacted.text)
# "Customer  () reported an issue with order #12345"
Why this matters:
  • Compliant long-term storage
  • No risk of data breach exposing PII
  • Meets “right to be forgotten” requirements

2. Third-Party Analytics

Share data with analytics platforms without exposing sensitive information.
# Redact before sending to analytics
event = "User [email protected] completed purchase"
redacted = client.redact(event)

# Send to analytics
analytics.track(redacted.text)
# "User  completed purchase"
Use cases:
  • Google Analytics
  • Mixpanel, Amplitude
  • Custom analytics platforms
  • Business intelligence tools

3. Public Disclosure

Prepare data for public release or legal disclosure.
# Redact before publishing
document = """
Incident involving John Smith (SSN: 123-45-6789)
Contact: [email protected], Phone: +1-555-1234
"""

redacted = client.redact(document)
# All PII removed, safe for public release

4. Log Sanitization

Remove sensitive data from application logs.
# Redact logs before storage
log_entry = "User login: [email protected] from IP 192.168.1.100"
redacted = client.redact(log_entry)

logger.info(redacted.text)
# "User login: <EMAIL_ADDRESS> from IP <IP_ADDRESS>"

5. GDPR Compliance

Implement “right to be forgotten” by permanently removing user data.
# User requests data deletion
user_records = fetch_user_records(user_id)

# Redact instead of delete (keeps records for analysis)
for record in user_records:
    redacted = client.redact(record)
    update_record(record.id, redacted.text)

When NOT to Use Redaction

Redaction is not suitable when:

1. You Need to Restore Data Later

Redaction is permanent. Use Tokenization instead.
# Bad - can't restore
redacted = client.redact("Contact [email protected]")
# No way to get "[email protected]" back

# Good - use tokenization
protected = client.tokenize("Contact [email protected]")
original = client.detokenize(protected.text, protected.mapping)

2. Users Need to Identify the Data

If users need to recognize their own data, use Masking.
# Bad - user can't identify their card
redacted = client.redact("Card: 4532-7562-9102-3456")
# Output: "Card: "

# Good - show last 4 digits
masked = client.mask("Card: 4532-7562-9102-3456")
# Output: "Card: ***************3456"

3. You Need Consistent Identifiers

For analytics with user tracking, use Hashing.
# Bad - can't track same user across events
redacted1 = client.redact("User: [email protected]")  # "User: "
redacted2 = client.redact("User: [email protected]")  # "User: "
# Both look the same, can't distinguish users

# Good - same user gets same hash
hash1 = client.hash("User: [email protected]")  # ID_a3f8b9...
hash2 = client.hash("User: [email protected]")  # ID_a3f8b9... (same)

Key Features

Permanent Removal

Data is completely removed and cannot be recovered

Complete Deletion

Sensitive text is deleted, not replaced

GDPR Compliant

Meets data minimization requirements

50+ Entity Types

Removes all detected PII types

Quick Start

from blindfold import Blindfold

client = Blindfold(api_key="your-api-key")

# Basic redaction
result = client.redact(
    "Contact John Doe at [email protected] or call +1-555-1234"
)

print(result.text)
# "Contact  at  or call "

print(f"Redacted {result.entities_count} entities")
# "Redacted 3 entities"

# Check what was redacted
for entity in result.detected_entities:
    print(f"- {entity.entity_type}: {entity.text} (removed)")
# - PERSON: John Doe (removed)
# - EMAIL_ADDRESS: [email protected] (removed)
# - PHONE_NUMBER: +1-555-1234 (removed)

Configuration Options

Filter Specific Entity Types

Only redact specific types of sensitive data:
# Only redact SSNs and credit cards
result = client.redact(
    "John Doe (SSN: 123-45-6789) paid with card 4532-7562-9102-3456",
    entities=["US_SSN", "CREDIT_CARD"]
)
# Output: "John Doe (SSN: ) paid with card "
# Name is NOT redacted

Adjust Confidence Threshold

Control detection sensitivity:
# Only high-confidence redactions
result = client.redact(
    text="Maybe email: test@test",
    score_threshold=0.8  # High confidence only
)
# Low-confidence detections are skipped

Common Patterns

Log Sanitization

Automatically redact logs before storage:
def safe_log(message: str, level: str = "info"):
    """Log messages with automatic PII redaction"""
    redacted = client.redact(message)

    if level == "info":
        logger.info(redacted.text)
    elif level == "error":
        logger.error(redacted.text)

# Usage
safe_log("User [email protected] failed to login from 192.168.1.100")
# Logs: "User  failed to login from "

Support Ticket Archival

Redact tickets before long-term storage:
def archive_ticket(ticket_data: dict):
    """Archive support ticket with redacted PII"""

    # Redact sensitive fields
    ticket_data['description'] = client.redact(
        ticket_data['description']
    ).text

    ticket_data['customer_notes'] = client.redact(
        ticket_data['customer_notes']
    ).text

    # Store safely
    archive_db.insert(ticket_data)

# Usage
ticket = {
    'id': 12345,
    'description': 'Customer John Doe ([email protected]) needs help',
    'customer_notes': 'My SSN is 123-45-6789'
}

archive_ticket(ticket)
# All PII removed before storage

Analytics Event Tracking

Send events to analytics without PII:
def track_event(event_name: str, properties: dict):
    """Track analytics event with redacted PII"""

    # Redact all string properties
    safe_properties = {}
    for key, value in properties.items():
        if isinstance(value, str):
            safe_properties[key] = client.redact(value).text
        else:
            safe_properties[key] = value

    # Send to analytics
    analytics.track(event_name, safe_properties)

# Usage
track_event("user_signup", {
    "email": "[email protected]",
    "source": "landing_page",
    "age": 25
})
# Analytics receives: email="<EMAIL_ADDRESS>", source="landing_page", age=25

Common Use Cases

Maintain audit logs without storing PII:
# Log user actions without PII
def log_user_action(user_email, action):
    redacted = client.redact(f"{user_email} performed {action}")
    compliance_log.write(redacted.text)

log_user_action("[email protected]", "password_reset")
# Logs: " performed password_reset"
Benefits: Audit trail maintained, no PII storage, GDPR compliant
Collect feedback without storing customer PII:
# Redact customer feedback before storage
def save_feedback(feedback_text, rating):
    redacted = client.redact(feedback_text)

    feedback_db.insert({
        'text': redacted.text,
        'rating': rating,
        'date': datetime.now()
    })

save_feedback(
    "Great service! Contact me at [email protected]",
    5
)
# Stores: "Great service! Contact me at "
Benefits: Feedback preserved, PII removed, safe for analysis
Share error reports without exposing user data:
# Redact error reports before sending to bug tracker
def report_error(error_message, user_context):
    redacted_message = client.redact(error_message)
    redacted_context = client.redact(user_context)

    bug_tracker.create_issue({
        'title': redacted_message.text,
        'description': redacted_context.text
    })

report_error(
    "Database error for user [email protected]",
    "User IP: 192.168.1.100, Session: abc123"
)
# Bug report contains no real PII
Benefits: Developers get context, user privacy protected
Create shareable datasets from sensitive data:
# Prepare dataset for public release
def create_public_dataset(private_records):
    public_records = []

    for record in private_records:
        redacted = client.redact(record)
        public_records.append(redacted.text)

    return public_records

# Original: ["John Doe, [email protected], +1-555-1234", ...]
# Public: [", , ", ...]
Benefits: Data useful for research, no privacy violations

Best Practices

1. Redact Early

Redact sensitive data as early as possible in your pipeline:
# Good - redact immediately
user_input = request.get_json()['message']
safe_message = client.redact(user_input).text
process_message(safe_message)

# Bad - redact late (PII may leak in logs, errors, etc.)
user_input = request.get_json()['message']
process_message(user_input)  # PII exposed during processing
redacted = client.redact(result)

2. Log What Was Redacted

Keep audit trails of redaction events:
result = client.redact(text)

# Log redaction metadata
audit_log.info({
    'action': 'redaction',
    'entities_redacted': result.entities_count,
    'entity_types': [e.entity_type for e in result.detected_entities],
    'timestamp': datetime.now()
})

3. Review Redaction Policies

Regularly review what gets redacted:
# Monitor redaction statistics
def analyze_redactions(timeframe):
    stats = {
        'total_redactions': 0,
        'entity_types': {}
    }

    for event in get_redaction_events(timeframe):
        stats['total_redactions'] += event.entities_count
        for entity in event.detected_entities:
            stats['entity_types'][entity.entity_type] = \
                stats['entity_types'].get(entity.entity_type, 0) + 1

    return stats

4. Combine with Other Methods

Use redaction alongside other privacy methods:
# Redact for long-term storage, tokenize for processing
def process_and_store(data):
    # Tokenize for processing
    protected = client.tokenize(data)
    result = process_with_ai(protected.text)

    # Redact for storage
    redacted = client.redact(result)
    database.save(redacted.text)

Security Considerations

Important redaction considerations:
  • Permanent: Redacted data cannot be recovered
  • Complete removal: Text is completely deleted, leaving gaps
  • Context flow: May affect readability with removed text
  • Not reversible: Unlike encryption, redaction cannot be undone
  • Review before production: Test redaction on sample data first

Learn More

Compare with Other Methods