modeio-redact
Run anonymization checks for text and files
Use this skill to anonymize, de-anonymize, or detect PII in text and files.
Scope
- Included:
- anonymize (
scripts/anonymize.py) - deanonymize (
scripts/deanonymize.py) - local detector diagnostics (
scripts/detect_local.py) - map lifecycle and file workflow helpers
- file-output assurance pipeline (coverage verification, residual scan)
- anonymize (
- Not included:
- prompt gateway / runtime request proxying (use
modeio-middleware) - git pre-commit staged diff scanning
- prompt gateway / runtime request proxying (use
Dependencies
Core dependencies (always required): none beyond the Python standard library.
Optional dependencies (required for specific features):
| Package | Required for | Install |
|---|---|---|
requests |
Non-lite levels (remote API calls) |
pip install requests |
python-docx |
.docx file read/write |
pip install python-docx |
PyMuPDF (fitz) |
.pdf file read/redact |
pip install PyMuPDF |
Missing optional packages raise a clear error at runtime when the feature is invoked.
For repo-local setup from the repo root:
python scripts/bootstrap_env.py
python scripts/doctor_env.py
Execution policy
- Default to
scripts/anonymize.py --jsonfor structured output. - Use
scripts/deanonymize.pyfor local restore (no network call). - Use
--level litefor offline/no-network anonymization. - Use
scripts/detect_local.pyonly when detailed local diagnostics are requested.
Level selection
| Scenario | Level | Reason |
|---|---|---|
| Offline or no network available | lite |
Local regex only, no API call |
| General anonymization (default) | dynamic |
Remote API path for broad coverage |
| Compliance-sensitive workflows | strict |
Includes compliance analysis |
| Cross-region transfer workflows | crossborder |
Requires jurisdiction codes |
Scripts
scripts/anonymize.py
-i, --input: required, literal content or supported file path--level:lite,dynamic,strict,crossborder(default:dynamic)--sender-code: required forcrossborder(example:CN SHA)--recipient-code: required forcrossborder(example:US NYC)--json: output structured JSON envelope--output: explicit output file path--in-place: overwrite input file in place (file-path input only)
Notes:
- Existing supported file paths are auto-read as file input.
liteis local-only; non-lite levels call backend anonymize API.- Non-lite API calls retry up to 2 times with exponential backoff (1s base) on 502/503/504 and network errors.
.pdfanonymization supports all levels for text-layer PDFs; non-lite requires API mapping entries for fail-closed projection..pdfapplies true PDF redaction (remove text layer content + black fill) and uses sidecar-only map linkage..pdfde-anonymization is not supported.- Default file output path is
<name>.redacted.<ext>with collision-safe suffixing. - Sidecar map ref file is written for file workflows as
<output-stem>.map.json(example:incident.redacted.map.json). - For file outputs, an assurance pipeline runs automatically:
.docx/.pdfuseverifiedpolicy (fail on coverage mismatch or residual findings); all other file types usebest_effortwith coverage enforcement.
python scripts/anonymize.py --input "Email: alice@example.com" --level dynamic --json
python scripts/anonymize.py --input "Phone 13812345678" --level lite --json
python scripts/anonymize.py --input ./incident.docx --level lite --json
python scripts/anonymize.py --input ./incident.pdf --level lite --json
python scripts/anonymize.py --input ./incident.pdf --level dynamic --json
scripts/deanonymize.py
-i, --input: required, anonymized text or supported file path--map: optional map ID or map file path--output: explicit output file path--in-place: overwrite file input in place--json: output structured JSON envelope
Map resolution order when --map is omitted:
- Embedded marker in
.txt/.md/.markdown - Sidecar map file
<input>.map.json - Latest local map fallback (literal text input only)
Map marker embedding styles per file type:
| File type | Marker style | Example |
|---|---|---|
.txt |
Hash-comment on first line | # modeio-redact-map-id: <id> |
.md, .markdown |
HTML comment on first line | <!-- modeio-redact-map-id: <id> --> |
| All others | No embedded marker | (uses sidecar .map.json only) |
python scripts/deanonymize.py --input "Email: [EMAIL_1]" --json
python scripts/deanonymize.py --input ./notes.redacted.txt --json
scripts/detect_local.py
-i, --input: required input text--profile:strict,balanced,precision(default:balanced)--allowlist-file: optional JSON allowlist rules--blocklist-file: optional JSON blocklist rules--thresholds-file: optional JSON threshold overrides--explain: print heuristic diagnostics in non-JSON mode--json: output full detector payload
Detects 13 entity types: phone, email, idCard, creditCard, bankCard, address, name, password, apiKey, ipAddress, ssn, passport, dateOfBirth.
Profile thresholds: strict lowers base thresholds by -0.12, balanced uses defaults, precision raises by +0.10.
python scripts/detect_local.py --input "Phone 13812345678 Email test@example.com" --json
python scripts/detect_local.py --input "Name: Alice Wang" --profile precision --json
Testing
python -m unittest discover tests -p "test_*.py"
python -m unittest discover tests -p "test_smoke_matrix_extensive.py"
MODEIO_REDACT_SKIP_API_SMOKE=1 python -m unittest discover tests -p "test_smoke_matrix_extensive.py"
Output contract
anonymize.py --json
Top-level envelope: success, tool, mode, level, data.
data fields:
anonymizedContent: redacted texthasPII: booleaninputType:textorfileinputPath: resolved source path (file input only)mapRef:{ mapId, mapPath, entryCount, sidecarPath? }outputPath: written file path (present when an output file is written)warnings:[{ code, message }](present when applicable)
Output-file fields (present when an output file is written):
applyReport:{ expectedCount, foundCount, appliedCount, missingCount, missedSpans, warnings }verificationReport:{ passed, skipped, residualCount, residuals, warnings }assurancePolicy:{ level, failOnCoverageMismatch, failOnResidualFindings }
deanonymize.py --json
Top-level envelope: success, tool, mode, data.
data fields:
deanonymizedContent: restored textreplacementSummary:{ totalReplacements, replacementsByType }mapRef:{ mapId, mapPath, entryCount }linkageSource:explicit-map,embedded-mapid,sidecar, orlatest-fallbackwarnings:[{ code, message }](for example,input_hash_mismatch)outputPath: written file path (present when an output file is written)
detect_local.py --json
Full output fields:
originalText: the unmodified inputsanitizedText: text with PII replaced by placeholdersitems: array of detected entities withtype,value,maskedValue,detectionScore,scoreReasons, and positional fields (startIndex,endIndex)riskScore: 0–100 aggregate risk scoreriskLevel:low,medium,highprofile: active profile namethresholds: threshold values used per entity typescoringMethod: scoring algorithm identifierdetectorVersion: detector version stringstats:{ candidateCount, keptCount }
When not to use
- Runtime LLM request/response gateway hooks (
modeio-middleware) - Command safety analysis (
modeio-guardrail)
Resources
scripts/anonymize.py— CLI entry point for anonymizationscripts/deanonymize.py— CLI entry point for de-anonymizationscripts/detect_local.py— CLI entry point for local PII detectionARCHITECTURE.md— package layout and boundary rulesANONYMIZE_API_URLenv var — optional endpoint override (default:https://safety-cf.modeio.ai/api/cf/anonymize)MODEIO_REDACT_MAP_DIRenv var — optional local map directory override (default:~/.modeio/redact/maps/)