Hashing is a privacy protection method that replaces sensitive data with deterministic hash values. The same input always produces the same hash, making it perfect for analytics and user tracking without storing actual PII.Example:
Create consistent user identifiers without sharing PII between systems.
Copy
# System A: Hash user emailuser_id = client.hash("[email protected]", hash_prefix="uid_").text# System B: Same hash for same user# Both systems can track the same user without sharing the email
Match records across databases without exposing the matching key.
Copy
# Database Acustomer_id = client.hash("[email protected]", hash_prefix="cust_").text# Database B (can match using hash, not email)if hash_exists_in_db(customer_id): # Match found, no PII shared link_records(customer_id)
# Bad - can't restorehashed = client.hash("[email protected]")# No way to get "[email protected]" back# Good - use tokenizationprotected = client.tokenize("[email protected]")original = client.detokenize(protected.text, protected.mapping)
If users need to identify their own information, use Masking.
Copy
# Bad - user can't recognize thishashed = client.hash("Card: 4532-7562-9102-3456")# Output: "Card: ID_x7f9a3c4b2e8d5f1"# Good - show last 4 digitsmasked = client.mask("Card: 4532-7562-9102-3456")# Output: "Card: ***************3456"
# Risky - simple values can be brute-forcedclient.hash("1") # Easy to reverseclient.hash("yes") # Easy to reverse# Better - add salt or use different methodclient.tokenize("1") # Random tokens
result = client.hash( "User [email protected] from IP 192.168.1.1", entities=["EMAIL_ADDRESS"], # Only hash emails hash_prefix="user_")# Output: "User user_a3f8b9c2... from IP 192.168.1.1"
def create_universal_id(email: str) -> str: """Create universal user ID that works across platforms""" result = client.hash( email, hash_type="sha256", hash_prefix="uid_", hash_length=20 ) return result.text# Platform Auid_a = create_universal_id("[email protected]")platform_a_db.save(uid_a, user_data)# Platform Buid_b = create_universal_id("[email protected]")# uid_a == uid_b, can match records without sharing email
Benefits: User-level analytics without PII, GDPR compliant
A/B Testing
Assign users to test groups consistently:
Copy
def get_ab_test_variant(user_email): """Consistently assign user to A/B test variant""" hashed = client.hash(user_email, hash_length=8) # Use hash to determine variant hash_int = int(hashed.text[:8], 16) variant = 'A' if hash_int % 2 == 0 else 'B' return variant# Same user always gets same variantvariant1 = get_ab_test_variant("[email protected]") # 'A'variant2 = get_ab_test_variant("[email protected]") # 'A' (same)
Benefits: Consistent variants, no PII stored, reproducible
Data Warehouse Integration
Share data between teams without exposing PII:
Copy
# Marketing hashes customer emailsdef prepare_for_warehouse(customer_data): for customer in customer_data: customer['id'] = client.hash( customer['email'], hash_prefix="c_" ).text del customer['email'] # Remove PII return customer_data# Analytics team can match using hash# No access to actual emails
Benefits: Data sharing without PII exposure, compliance maintained
Duplicate Detection
Find duplicates without comparing raw data:
Copy
def check_duplicate(email): """Check if user already exists using hash""" hashed = client.hash(email, hash_prefix="user_") if database.exists(hashed.text): return True, "User already registered" else: database.insert(hashed.text) return False, "New user"# Check without storing actual emailis_duplicate, message = check_duplicate("[email protected]")
# Good - strong algorithmclient.hash(text, hash_type="sha256")# Acceptable for non-sensitive dataclient.hash(text, hash_type="md5")# Not recommended for sensitive data# (MD5 has known vulnerabilities)