Hashing is a privacy protection method that replaces sensitive data with deterministic hash values. The same input always produces the same hash, making it perfect for analytics and user tracking without storing actual PII.Example:
Create consistent user identifiers without sharing PII between systems.
Copy
# System A: Hash user emailuser_id = client.hash("john@example.com", hash_prefix="uid_").text# System B: Same hash for same user# Both systems can track the same user without sharing the email
Match records across databases without exposing the matching key.
Copy
# Database Acustomer_id = client.hash("john@example.com", hash_prefix="cust_").text# Database B (can match using hash, not email)if hash_exists_in_db(customer_id): # Match found, no PII shared link_records(customer_id)
# Bad - can't restorehashed = client.hash("john@example.com")# No way to get "john@example.com" back# Good - use tokenizationprotected = client.tokenize("john@example.com")original = client.detokenize(protected.text, protected.mapping)
If users need to identify their own information, use Masking.
Copy
# Bad - user can't recognize thishashed = client.hash("Card: 4532-7562-9102-3456")# Output: "Card: ID_x7f9a3c4b2e8d5f1"# Good - show last 4 digitsmasked = client.mask("Card: 4532-7562-9102-3456")# Output: "Card: ***************3456"
# Risky - simple values can be brute-forcedclient.hash("1") # Easy to reverseclient.hash("yes") # Easy to reverse# Better - add salt or use different methodclient.tokenize("1") # Random tokens
result = client.hash( "User john@example.com from IP 192.168.1.1", entities=["EMAIL_ADDRESS"], # Only hash emails hash_prefix="user_")# Output: "User user_a3f8b9c2... from IP 192.168.1.1"
def create_universal_id(email: str) -> str: """Create universal user ID that works across platforms""" result = client.hash( email, hash_type="sha256", hash_prefix="uid_", hash_length=20 ) return result.text# Platform Auid_a = create_universal_id("john@example.com")platform_a_db.save(uid_a, user_data)# Platform Buid_b = create_universal_id("john@example.com")# uid_a == uid_b, can match records without sharing email
Benefits: User-level analytics without PII, GDPR compliant
A/B Testing
Assign users to test groups consistently:
Copy
def get_ab_test_variant(user_email): """Consistently assign user to A/B test variant""" hashed = client.hash(user_email, hash_length=8) # Use hash to determine variant hash_int = int(hashed.text[:8], 16) variant = 'A' if hash_int % 2 == 0 else 'B' return variant# Same user always gets same variantvariant1 = get_ab_test_variant("john@example.com") # 'A'variant2 = get_ab_test_variant("john@example.com") # 'A' (same)
Benefits: Consistent variants, no PII stored, reproducible
Data Warehouse Integration
Share data between teams without exposing PII:
Copy
# Marketing hashes customer emailsdef prepare_for_warehouse(customer_data): for customer in customer_data: customer['id'] = client.hash( customer['email'], hash_prefix="c_" ).text del customer['email'] # Remove PII return customer_data# Analytics team can match using hash# No access to actual emails
Benefits: Data sharing without PII exposure, compliance maintained
Duplicate Detection
Find duplicates without comparing raw data:
Copy
def check_duplicate(email): """Check if user already exists using hash""" hashed = client.hash(email, hash_prefix="user_") if database.exists(hashed.text): return True, "User already registered" else: database.insert(hashed.text) return False, "New user"# Check without storing actual emailis_duplicate, message = check_duplicate("john@example.com")
# Good - strong algorithmclient.hash(text, hash_type="sha256")# Acceptable for non-sensitive dataclient.hash(text, hash_type="md5")# Not recommended for sensitive data# (MD5 has known vulnerabilities)