data-masker
Data Masker Protocol
This skill prevents sensitive production data (PII, PHI, financial records) from leaking into lower environments (staging, development, testing). It analyzes schemas and generates idempotent masking scripts.
Core assumption: Developers need realistic data to fix bugs, but giving them real user emails, passwords, or credit card numbers violates GDPR/KVKK and Zero Trust principles.
1. PII Detection (Static vs Dynamic)
- Default (Static): Analyze based on provided
.sql, schema files, or DDL text. - Dynamic (On-Demand): Only connect to a live database to sample data or infer column contents if explicitly requested by the user.
- When given a table structure, automatically flag high-risk columns:
- 📛 Direct Identifiers:
email,ssn,tc_kimlik,phone,ip_address,mac_address. - 💳 Financial:
credit_card,iban,balance,salary. - 🩺 Health/Personal:
birth_date,blood_type,address,location_lat_lon.
- 📛 Direct Identifiers:
2. Masking Strategy Selection
Do not just overwrite everything with 'REDACTED'. Choose the right mathematical mutation to keep the data realistic for QA testing:
Strategy 1: Deterministic Substitution (Fake Data)
- Best for: Names, Emails.
- Why: To make the UI look normal.
john.doe@example.combecomesx8f9.mask@test.local.
Strategy 2: Partial Redaction
- Best for: Credit Cards, Phone Numbers.
- Why:
+1 (555) 123-4567becomes+1 (555) ***-**67. Devs can still test formatting validations.
Strategy 3: Variance / Shuffling (Jittering)
- Best for: Dates, Salaries.
- Why:
salary: 105,000-> add +/- 20% random variance ->91,200. Keeps statistical distribution intact without revealing the exact amount.
Strategy 4: Hashing / Nullification
- Best for: Passwords, API Tokens.
- Why: Replace all passwords with a known development hash (e.g.,
password123) so devs can log in as any test user without knowing the real user's password.
3. Output Generation
Provide an executable SQL script that can be run on a cloned staging database.
Required Outputs (Must write BOTH to docs/database-report/):
- Human-Readable Markdown (
docs/database-report/data-masking-report.md)
### 🛡️ PII Discovery
- **Risk Level: HIGH** (Found emails, phones, and hashed passwords).
### 🛠️ Masking Execution Script (PostgreSQL)
```sql
-- Disable triggers temporarily to speed up the masking
ALTER TABLE users DISABLE TRIGGER ALL;
-- Masking `users` table
UPDATE users SET
-- Strategy: Deterministic Substitution
email = 'masked_' || id || '@sandbox.local',
first_name = 'User_' || substring(md5(random()::text) from 1 for 6),
last_name = 'Test',
-- Strategy: Partial Redaction
phone_number = concat(left(phone_number, 3), '***', right(phone_number, 2)),
-- Strategy: Known Dev Value
password_hash = '$2b$10$dev_password_hash_xyz';
-- Re-enable triggers
ALTER TABLE users ENABLE TRIGGER ALL;
2. **Machine-Readable JSON (`docs/database-report/data-masking-output.json`)**
```json
{
"skill": "data-masker",
"pii_found": ["email", "phone_number", "password_hash"],
"masking_strategies_applied": {
"email": "Deterministic Substitution",
"phone_number": "Partial Redaction"
},
"sql_script_generated": "UPDATE users SET email = ..."
}
Guardrails
- Performance: Bulk
UPDATEon 10 million rows will overwhelm WAL logs. If the table is massive, suggest theCREATE TABLE AS SELECT (CTAS)strategy instead ofUPDATE. - Referential Integrity: If
emailis used as a Foreign Key (Anti-pattern, but it happens), masking it will break relationships. Detect FKs before masking. - Irreversibility: Ensure the masking SQL uses one-way functions. Randomization seeds should not be deterministic.
More from fatih-developer/fth-skills
task-decomposer
Break down large, complex, or ambiguous tasks into independent subtasks with dependency maps, execution order, and success criteria. Plan first, then execute step by step. Triggers on 'how should I do this', 'where do I start', 'plan the project', 'break it down', 'implement' or whenever a task involves multiple phases.
24context-compressor
Compress long conversation histories, large code files, research results, and documents by 70% without losing critical information. Triggers when context window fills up, when summarizing previous steps in multi-step tasks, before loading large files into context, or on 'summarize', 'compress', 'reduce context', 'save tokens'.
18multi-brain-debate
Two-round debate protocol where perspectives challenge each other before consensus. Round 1 presents independent positions, Round 2 allows counter-arguments and rebuttals. Produces battle-tested decisions for high-stakes choices.
17multi-brain-score
Confidence scoring overlay for multi-brain decisions. Each perspective rates its own confidence (1-10) with justification. Consensus uses scores as weights, flags low-confidence areas, and surfaces uncertainty explicitly.
15checkpoint-guardian
Automatic risk assessment before every critical action in agentic workflows. Detects irreversible operations (file deletion, database writes, deployments, payments), classifies risk level, and requires confirmation before proceeding. Triggers on destructive keywords like deploy, delete, send, publish, update database, process payment.
14parallel-planner
Analyze multi-step tasks to identify which steps can run in parallel, build dependency graphs, detect conflicts (write-write, read-write, resource contention), and produce optimized execution plans. Triggers on 3+ independent steps, 'speed up', 'run simultaneously', 'parallelize', 'optimize' or any task where sequential execution wastes time.
14