pii-masking-patterns
SKILL.md
PII Masking Patterns
Protect sensitive data in LLM observability pipelines with automated PII detection and redaction.
Overview
- Masking PII before logging prompts and responses
- Integrating with Langfuse tracing via mask callbacks
- Using Microsoft Presidio for enterprise-grade detection
- Implementing LLM Guard for input/output sanitization
- Pre-logging redaction with structlog/loguru
Quick Reference
Langfuse Mask Callback (Recommended)
import re
from langfuse import Langfuse
def mask_pii(data, **kwargs):
"""Mask PII before sending to Langfuse."""
if isinstance(data, str):
# Credit cards
data = re.sub(r'\b(?:\d[ -]*?){13,19}\b', '[REDACTED_CC]', data)
# Emails
data = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[REDACTED_EMAIL]', data)
# Phone numbers
data = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[REDACTED_PHONE]', data)
# SSN
data = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED_SSN]', data)
return data
# Initialize with masking
langfuse = Langfuse(mask=mask_pii)
Microsoft Presidio Pipeline
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def anonymize_text(text: str, language: str = "en") -> str:
"""Detect and anonymize PII using Presidio."""
results = analyzer.analyze(text=text, language=language)
anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
return anonymized.text
LLM Guard Sanitization
from llm_guard.input_scanners import Anonymize
from llm_guard.output_scanners import Sensitive
from llm_guard.vault import Vault
vault = Vault() # Stores original values for deanonymization
# Input sanitization
input_scanner = Anonymize(vault, preamble="", language="en")
sanitized_prompt, is_valid, risk_score = input_scanner.scan(prompt)
# Output sanitization
output_scanner = Sensitive(entity_types=["PERSON", "EMAIL"], redact=True)
sanitized_output, is_valid, risk_score = output_scanner.scan(prompt, response)
Key Decisions
| Decision | Recommendation |
|---|---|
| Detection engine | Presidio (enterprise), regex (simple), LLM Guard (LLM pipelines) |
| Masking strategy | Replace with type tokens [REDACTED_EMAIL] for debuggability |
| Performance | Use async/batch processing for high-throughput |
| Langfuse integration | Use mask= callback at client initialization |
| Reversibility | Use LLM Guard Vault for deanonymization when needed |
Anti-Patterns
# ❌ NEVER log raw PII
logger.info(f"User email: {user.email}") # PII leakage!
# ❌ NEVER send unmasked data to observability
langfuse.trace(input=raw_prompt) # May contain PII!
# ✅ ALWAYS mask before logging
logger.info(f"User email: {mask_email(user.email)}")
# ✅ ALWAYS use mask callback
langfuse = Langfuse(mask=mask_pii)
Detailed Documentation
| Resource | Description |
|---|---|
| references/presidio-integration.md | Microsoft Presidio setup, custom recognizers, batch processing |
| references/langfuse-mask-callback.md | Langfuse SDK mask implementation patterns |
| references/llm-guard-sanitization.md | LLM Guard Anonymize/Deanonymize with Vault |
| references/logging-redaction.md | structlog/loguru pre-logging patterns |
| checklists/pii-masking-setup-checklist.md | Implementation checklist |
Related Skills
langfuse-observability- Tracing with PII masking integrationdefense-in-depth- Security layer including data protectionadvanced-guardrails- LLM safety guardrailsinput-validation- Input sanitization patterns
Capability Details
langfuse-masking
Keywords: langfuse mask, trace masking, observability pii, mask callback Solves:
- Mask PII in Langfuse traces
- Protect sensitive data in LLM observability
- GDPR compliance for LLM logging
presidio-detection
Keywords: presidio, pii detection, microsoft presidio, named entity, ner Solves:
- Detect PII using NLP models
- Custom entity recognizers
- Enterprise-grade PII detection
llm-guard-anonymization
Keywords: llm guard, anonymize, deanonymize, vault, sanitize Solves:
- Sanitize LLM inputs and outputs
- Reversible anonymization with Vault
- Input/output scanner pipeline
regex-masking
Keywords: regex, pattern matching, email mask, phone mask, ssn mask Solves:
- Simple pattern-based PII masking
- Lightweight masking without ML
- Custom pattern detection
logging-redaction
Keywords: structlog, loguru, logging, redact, pre-logging Solves:
- Redact PII before logging
- Structured logging with masking
- Log processor patterns
Weekly Installs
6
Repository
yonatangross/orchestkitGitHub Stars
94
First Seen
Feb 2, 2026
Security Audits
Installed on
claude-code4
opencode3
github-copilot3
antigravity3
gemini-cli3
replit2