data-leakage-prevention

Installation

SKILL.md

Data Leakage Prevention

Use this skill for submission-time security checks and general file compliance reviews. Combine semantic review with deterministic tooling, and make the audit boundary explicit before any scan starts.

Core Rules

Declare the audit boundary first. State the scope type and the resolved files before scanning.
Collect environment context early. Check whether the target is git-backed, whether .gitleaks.toml or similar policy files exist, whether .pre-commit-config.yaml exists, and whether local text-extraction tools are available for binary documents.
Respect repository rules before running detectors. If .gitleaks.toml or similar config exists, honor its ignore rules and use its custom rules as review constraints when possible.
Choose scan depth from the change summary. Do not default to full semantic review for generated, third-party, or oversized changes.
Report precise findings and keep likely false positives separate.

Boundary And Context

Supported scope types:

Git: staged, changed, commit <hash>, pr <id>
Filesystem: entire repo, directory, specific file

When the scope is git-based, review git identity unless the user explicitly says not to:

inspect git config user.name
inspect git config user.email
review authors inside the audit range with git --no-pager log --format="%an <%ae>"
flag names or emails that appear to expose personal identity inappropriately

Load references only when needed:

Scope Discovery for concrete git or filesystem commands
Git Identity Review for exact identity-check commands and interpretation help
Secret Types only when severity is unclear after semantic review

Scan Mode Selection

Use deep review for most user-authored code, configuration, infrastructure files, documentation, and other manageable text where context changes severity.

Use fast review for:

third-party code
vendored or submodule content
generated files
very large diffs
very long files where broad semantic reading is wasteful

Escalate from fast review to targeted semantic review when automation finds something material.

Scan Procedure

Deep Review

Read the files or the relevant diff hunks.
Perform semantic fuzz review for secrets, PII, unsafe metadata, and context that changes severity.
Run pii_scan.py for Presidio-based PII detection.
Run secret_scan.py for detect-secrets-based secret detection.
If .pre-commit-config.yaml exists and already contains relevant scanning hooks, run the matching pre-commit hooks instead of inventing a parallel workflow.
Reconcile semantic findings with tool findings and classify likely false positives explicitly.

Run the bundled Python scripts with uv run so their PEP 723 dependencies are installed automatically. Do not invoke them with plain python unless dependency management has already been handled separately.

Fast Review

Skip broad semantic reading.
Run pii_scan.py and secret_scan.py.
If .pre-commit-config.yaml exists and already contains relevant scanning hooks, run the matching hooks.
Inspect only flagged locations or obviously high-risk files semantically.

Use uv run for both bundled scripts in fast review as well.

Binary And Non-Plaintext Files

If the scope contains files such as PDF, PPT, DOCX, XLSX, or other binary formats:

Try to convert them to text first.
Scan the extracted text when conversion succeeds.
Record the original binary file in the report.
If no suitable local tool exists, mark the file as skipped and state the reason.

Do not claim coverage for binary files that were not actually converted or scanned.

Severity Guidance

Use semantic judgment first. The categories below are guidance, not a closed taxonomy.

Critical: live credentials, private keys, production secrets, cloud tokens, signing material, or anything that can plausibly grant direct access or privileged control
High: real personal data, real customer data, internal secrets, or combinations of identifiers that create material exposure
Medium: partial or contextual sensitive data, non-production secrets with unclear validity, or findings that need more confirmation
Low: weak signals, low-impact metadata, sample-like data with some risk, or findings likely to be test fixtures

Reporting Requirements

The final report must include:

whether secrets or PII were found
the declared audit boundary
the chosen scan mode and why
the resolved file list and file types
confirmed findings
suspected false positives
skipped files and reasons
git identity review results when the scope is git-based

List concrete findings in this format:

./path/to/file:line:column | Severity | PII|Secret | Source | Status | Summary

Use these source labels when possible:

Presidio
Detect-secrets
Fuzzy review
custom-rule when a repository policy file contributes a direct hit

If a detector only provides a line number, use column 1.

Related skills

More from zenless-lab/skills

Installs

Repository

zenless-lab/skills

GitHub Stars

First Seen

Apr 6, 2026

Security Audits

Gen Agent Trust HubWarn

SocketPass

SnykPass

data-leakage-prevention

Data Leakage Prevention

Core Rules

Boundary And Context

Scan Mode Selection

Scan Procedure

Deep Review

Fast Review

Binary And Non-Plaintext Files

Severity Guidance

Reporting Requirements

More from zenless-lab/skills

python-docstring-expert

cloud-init-crafter

readme-crafter

skill-expert

agents-md-crafter

skill-crafter