skills/yonatangross/orchestkit/pii-masking-patterns

pii-masking-patterns

SKILL.md

PII Masking Patterns

Protect sensitive data in LLM observability pipelines with automated PII detection and redaction.

Overview

  • Masking PII before logging prompts and responses
  • Integrating with Langfuse tracing via mask callbacks
  • Using Microsoft Presidio for enterprise-grade detection
  • Implementing LLM Guard for input/output sanitization
  • Pre-logging redaction with structlog/loguru

Quick Reference

Langfuse Mask Callback (Recommended)

import re
from langfuse import Langfuse

def mask_pii(data, **kwargs):
    """Mask PII before sending to Langfuse."""
    if isinstance(data, str):
        # Credit cards
        data = re.sub(r'\b(?:\d[ -]*?){13,19}\b', '[REDACTED_CC]', data)
        # Emails
        data = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[REDACTED_EMAIL]', data)
        # Phone numbers
        data = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[REDACTED_PHONE]', data)
        # SSN
        data = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED_SSN]', data)
    return data

# Initialize with masking
langfuse = Langfuse(mask=mask_pii)

Microsoft Presidio Pipeline

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def anonymize_text(text: str, language: str = "en") -> str:
    """Detect and anonymize PII using Presidio."""
    results = analyzer.analyze(text=text, language=language)
    anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
    return anonymized.text

LLM Guard Sanitization

from llm_guard.input_scanners import Anonymize
from llm_guard.output_scanners import Sensitive
from llm_guard.vault import Vault

vault = Vault()  # Stores original values for deanonymization

# Input sanitization
input_scanner = Anonymize(vault, preamble="", language="en")
sanitized_prompt, is_valid, risk_score = input_scanner.scan(prompt)

# Output sanitization
output_scanner = Sensitive(entity_types=["PERSON", "EMAIL"], redact=True)
sanitized_output, is_valid, risk_score = output_scanner.scan(prompt, response)

Key Decisions

Decision Recommendation
Detection engine Presidio (enterprise), regex (simple), LLM Guard (LLM pipelines)
Masking strategy Replace with type tokens [REDACTED_EMAIL] for debuggability
Performance Use async/batch processing for high-throughput
Langfuse integration Use mask= callback at client initialization
Reversibility Use LLM Guard Vault for deanonymization when needed

Anti-Patterns

# ❌ NEVER log raw PII
logger.info(f"User email: {user.email}")  # PII leakage!

# ❌ NEVER send unmasked data to observability
langfuse.trace(input=raw_prompt)  # May contain PII!

# ✅ ALWAYS mask before logging
logger.info(f"User email: {mask_email(user.email)}")

# ✅ ALWAYS use mask callback
langfuse = Langfuse(mask=mask_pii)

Detailed Documentation

Resource Description
references/presidio-integration.md Microsoft Presidio setup, custom recognizers, batch processing
references/langfuse-mask-callback.md Langfuse SDK mask implementation patterns
references/llm-guard-sanitization.md LLM Guard Anonymize/Deanonymize with Vault
references/logging-redaction.md structlog/loguru pre-logging patterns
checklists/pii-masking-setup-checklist.md Implementation checklist

Related Skills

  • langfuse-observability - Tracing with PII masking integration
  • defense-in-depth - Security layer including data protection
  • advanced-guardrails - LLM safety guardrails
  • input-validation - Input sanitization patterns

Capability Details

langfuse-masking

Keywords: langfuse mask, trace masking, observability pii, mask callback Solves:

  • Mask PII in Langfuse traces
  • Protect sensitive data in LLM observability
  • GDPR compliance for LLM logging

presidio-detection

Keywords: presidio, pii detection, microsoft presidio, named entity, ner Solves:

  • Detect PII using NLP models
  • Custom entity recognizers
  • Enterprise-grade PII detection

llm-guard-anonymization

Keywords: llm guard, anonymize, deanonymize, vault, sanitize Solves:

  • Sanitize LLM inputs and outputs
  • Reversible anonymization with Vault
  • Input/output scanner pipeline

regex-masking

Keywords: regex, pattern matching, email mask, phone mask, ssn mask Solves:

  • Simple pattern-based PII masking
  • Lightweight masking without ML
  • Custom pattern detection

logging-redaction

Keywords: structlog, loguru, logging, redact, pre-logging Solves:

  • Redact PII before logging
  • Structured logging with masking
  • Log processor patterns
Weekly Installs
6
GitHub Stars
94
First Seen
Feb 2, 2026
Installed on
claude-code4
opencode3
github-copilot3
antigravity3
gemini-cli3
replit2