data-leakage-prevention
Data Leakage Prevention
Use this skill for submission-time security checks and general file compliance reviews. Combine semantic review with deterministic tooling, and make the audit boundary explicit before any scan starts.
Core Rules
- Declare the audit boundary first. State the scope type and the resolved files before scanning.
- Collect environment context early. Check whether the target is git-backed, whether
.gitleaks.tomlor similar policy files exist, whether.pre-commit-config.yamlexists, and whether local text-extraction tools are available for binary documents. - Respect repository rules before running detectors. If
.gitleaks.tomlor similar config exists, honor its ignore rules and use its custom rules as review constraints when possible. - Choose scan depth from the change summary. Do not default to full semantic review for generated, third-party, or oversized changes.
- Report precise findings and keep likely false positives separate.
Boundary And Context
Supported scope types:
- Git:
staged,changed,commit <hash>,pr <id> - Filesystem:
entire repo,directory,specific file
When the scope is git-based, review git identity unless the user explicitly says not to:
- inspect
git config user.name - inspect
git config user.email - review authors inside the audit range with
git --no-pager log --format="%an <%ae>" - flag names or emails that appear to expose personal identity inappropriately
Load references only when needed:
- Scope Discovery for concrete git or filesystem commands
- Git Identity Review for exact identity-check commands and interpretation help
- Secret Types only when severity is unclear after semantic review
Scan Mode Selection
Use deep review for most user-authored code, configuration, infrastructure files, documentation, and other manageable text where context changes severity.
Use fast review for:
- third-party code
- vendored or submodule content
- generated files
- very large diffs
- very long files where broad semantic reading is wasteful
Escalate from fast review to targeted semantic review when automation finds something material.
Scan Procedure
Deep Review
- Read the files or the relevant diff hunks.
- Perform semantic fuzz review for secrets, PII, unsafe metadata, and context that changes severity.
- Run pii_scan.py for Presidio-based PII detection.
- Run secret_scan.py for detect-secrets-based secret detection.
- If
.pre-commit-config.yamlexists and already contains relevant scanning hooks, run the matching pre-commit hooks instead of inventing a parallel workflow. - Reconcile semantic findings with tool findings and classify likely false positives explicitly.
Run the bundled Python scripts with uv run so their PEP 723 dependencies are installed automatically. Do not invoke them with plain python unless dependency management has already been handled separately.
Fast Review
- Skip broad semantic reading.
- Run pii_scan.py and secret_scan.py.
- If
.pre-commit-config.yamlexists and already contains relevant scanning hooks, run the matching hooks. - Inspect only flagged locations or obviously high-risk files semantically.
Use uv run for both bundled scripts in fast review as well.
Binary And Non-Plaintext Files
If the scope contains files such as PDF, PPT, DOCX, XLSX, or other binary formats:
- Try to convert them to text first.
- Scan the extracted text when conversion succeeds.
- Record the original binary file in the report.
- If no suitable local tool exists, mark the file as skipped and state the reason.
Do not claim coverage for binary files that were not actually converted or scanned.
Severity Guidance
Use semantic judgment first. The categories below are guidance, not a closed taxonomy.
Critical: live credentials, private keys, production secrets, cloud tokens, signing material, or anything that can plausibly grant direct access or privileged controlHigh: real personal data, real customer data, internal secrets, or combinations of identifiers that create material exposureMedium: partial or contextual sensitive data, non-production secrets with unclear validity, or findings that need more confirmationLow: weak signals, low-impact metadata, sample-like data with some risk, or findings likely to be test fixtures
Reporting Requirements
The final report must include:
- whether secrets or PII were found
- the declared audit boundary
- the chosen scan mode and why
- the resolved file list and file types
- confirmed findings
- suspected false positives
- skipped files and reasons
- git identity review results when the scope is git-based
List concrete findings in this format:
./path/to/file:line:column | Severity | PII|Secret | Source | Status | Summary
Use these source labels when possible:
PresidioDetect-secretsFuzzy reviewcustom-rulewhen a repository policy file contributes a direct hit
If a detector only provides a line number, use column 1.
More from zenless-lab/skills
python-docstring-expert
Expert methodology for evaluating, formatting, and generating Python docstrings. Use when creating or updating documentation for Python code, determining if a docstring is necessary based on API exposure, or formatting docstrings for modules, classes, and functions. Load this skill when code changes affect existing docstrings to keep them synchronized.
16cloud-init-crafter
Expert assistant for creating, modifying, and debugging cloud-init scripts. Supports multiple formats (YAML, shell, MIME archives), Jinja templating with instance-data, and multi-platform validation.
9readme-crafter
Use this skill when you need to write, refactor, or improve a project's README.md file. Trigger this anytime the user wants to create documentation, project overviews, or profile pages following best practices and modern aesthetics.
8skill-expert
Comprehensive master guide for designing, creating, editing, updating, and refactoring Agent Skills according to the official agentskills.io specification. Use this when you need to build or modify capabilities for an AI agent.
8agents-md-crafter
Use this skill when you need to create, update, or improve AI agent instruction files like AGENTS.md, GEMINI.md, or copilot-instructions.md. Trigger this anytime the user wants to set up standard AI rules, document project context for LLMs, or add repository-wide guidelines for AI agents.
8skill-crafter
Use this skill when creating a new skill or when modifying, updating, refactoring, restructuring, or reviewing an existing skill. Trigger it for framework-specific and framework-agnostic skill work, including SKILL.md design, folder layout, scripts, references, assets, metadata, and description optimization, even if the user only says "make a skill", "improve this skill", or "refactor the skill".
7