sensitive-content-scanner
Sensitive Content Scanner
Core Purpose
Examine files for sensitive content that should be sanitized before sharing publicly. This is a safety check to run before publishing repos, sharing code snippets, exporting configurations, or any public sharing of files.
When to Use
- Before pushing a repo public
- Before sharing code/config snippets
- Before exporting skills, dotfiles, or configs for others
- When reviewing what might leak from a file set
- As part of an automated sanitization workflow
Invocation
/sensitive-content-scanner [path] - Scan a file or directory
/sensitive-content-scanner - Prompted for path
Examples:
/sensitive-content-scanner ~/.claude/skills//sensitive-content-scanner ./my-project//sensitive-content-scanner ./config.md
Personal Context File (Optional)
For better detection of YOUR specific sensitive content, create ~/.claude/sensitive-content-context.md:
# Sensitive Content Context
## Personal Identifiers
- Full name: [Your Name]
- Usernames: [github-handle, twitter-handle, etc.]
- Email patterns: [yourname@, your.name@]
- Company/team name: [Company Name]
## Private Paths (patterns to flag)
- ~/Library/Mobile Documents/com~apple~CloudDocs/
- /Users/[username]/
- Any path containing: [folder names that are private]
## Private URLs (domains/patterns to flag)
- notion.site (personal workspaces)
- [internal-tool].company.com
- Private GitHub org: github.com/[private-org]/
## Known Secrets Patterns
- Project-specific API key prefixes
- Internal service names
## Business/Proprietary Terms
- Client names: [list]
- Internal project codenames: [list]
- Confidential terms: [list]
If this file exists, the scanner will use it for personalized detection. If not, it uses generic patterns.
Detection Categories
1. Credentials & Secrets (CRITICAL)
Always flag - these should never be shared:
| Pattern | Examples |
|---|---|
| API keys | sk-..., AKIA..., ghp_..., xoxb-... |
| Private keys | -----BEGIN RSA PRIVATE KEY----- |
| Passwords | password=, passwd:, pwd= |
| Tokens | token=, bearer ..., auth_token |
| Connection strings | postgres://user:pass@, mongodb+srv:// |
| AWS credentials | aws_access_key_id, aws_secret_access_key |
| Environment secrets | .env files with real values |
Severity: CRITICAL - Block sharing until resolved
2. Personal Identifiers (HIGH)
Flag for review - may need anonymization:
| Pattern | Examples |
|---|---|
| Email addresses | user@domain.com |
| Phone numbers | +1-555-..., (555) 123-4567 |
| Names (from context file) | Your name, family names |
| Usernames (from context file) | Social handles, login names |
| Physical addresses | Street addresses, ZIP codes |
Severity: HIGH - Review and anonymize or confirm OK to share
3. Private URLs (HIGH)
Flag - often reveal private resources:
| Pattern | Examples |
|---|---|
| Notion URLs | *.notion.site/*, notion.so/*/... |
| Google Docs (private) | docs.google.com/document/d/... |
| Internal tools | *.internal.*, *.corp.*, localhost:* |
| Private repos | github.com/[private-org]/... |
| Figma/design files | Private design URLs |
| Calendar/meeting links | Zoom personal rooms, Calendly |
Severity: HIGH - Replace with placeholder or remove
4. Local Paths (MEDIUM)
Flag - reveal system structure and username:
| Pattern | Examples |
|---|---|
| Home directory | /Users/username/, /home/username/ |
| iCloud paths | ~/Library/Mobile Documents/com~apple~CloudDocs/ |
| Dropbox/OneDrive | ~/Dropbox/, ~/OneDrive/ |
| App-specific paths | ~/Library/Application Support/[App]/ |
| Windows user paths | C:\Users\username\ |
Severity: MEDIUM - Replace with generic placeholder like ~/path/to/... or [YOUR_PATH]
5. Infrastructure & Security (MEDIUM-HIGH)
Flag - could enable attacks or reveal architecture:
| Pattern | Examples |
|---|---|
| Internal IPs | 10.x.x.x, 192.168.x.x, 172.16-31.x.x |
| Internal hostnames | *.internal, *.local, *.corp |
| Database hosts | Specific DB server addresses |
| Cloud resource IDs | AWS account IDs, GCP project IDs |
| CI/CD specifics | Internal Jenkins/GitHub Actions URLs |
Severity: MEDIUM-HIGH depending on context
6. Business & Proprietary (MEDIUM)
Flag if context file specifies - varies by situation:
| Pattern | Examples |
|---|---|
| Client names | (from context file) |
| Internal project names | Codenames, internal product names |
| Pricing/financial | Revenue figures, pricing tiers |
| Strategy content | Competitive analysis, roadmaps |
| Internal comms | Slack channel names, team names |
Severity: MEDIUM - Requires judgment
Scan Protocol
Phase 1: Load Context
- Check for
~/.claude/sensitive-content-context.md - If exists: load personal patterns and terms
- If not: use generic detection only, note limitations
Phase 2: File Discovery
- If path is a file: scan that file
- If path is a directory:
- Find all text-based files (md, txt, json, yaml, yml, toml, js, ts, py, sh, etc.)
- Respect
.gitignoreif present - Skip binary files, node_modules, .git, etc.
- Report file count and types found
Phase 3: Pattern Scanning
For each file, scan for all detection categories. Use:
- Regex patterns for structured data (emails, IPs, API keys)
- Context file terms for personal/business content
- Heuristics for paths and URLs
Phase 4: Findings Report
Generate structured report:
# Sensitive Content Scan Report
**Path scanned:** [path]
**Files scanned:** [count]
**Scan date:** [timestamp]
## Summary
| Severity | Count | Action Required |
| -------- | ----- | ----------------------- |
| CRITICAL | X | Must fix before sharing |
| HIGH | X | Review and sanitize |
| MEDIUM | X | Consider sanitizing |
| LOW | X | Informational |
## CRITICAL Findings
### [Category]: [Brief description]
**File:** `path/to/file.md`
**Line:** [line number]
**Content:** `[snippet with sensitive part highlighted]`
**Risk:** [Why this is sensitive]
**Suggested fix:** [How to sanitize]
---
[Repeat for each finding, grouped by severity]
## Recommendations
1. [Prioritized action items]
2. [...]
## Files Cleared
These files contained no detected sensitive content:
- [list of clean files]
Output Modes
Default: Full Report
Complete findings with context and suggestions.
Summary Mode
/sensitive-content-scanner [path] --summary
Just the summary table and critical findings.
JSON Mode (for automation)
/sensitive-content-scanner [path] --json
Machine-readable output for piping to other tools.
Integration with Other Skills
This skill is designed to be called by other skills/workflows:
Example - called by sync-skills-public:
1. sync-skills-public invokes sensitive-content-scanner on ~/.claude/skills/
2. Scanner returns findings
3. sync-skills-public uses findings to guide sanitization
4. Scanner re-runs on output to verify clean
When called programmatically, return structured findings that the calling skill can act on.
Limitations
- Cannot detect everything: Novel patterns or context-dependent sensitivity may be missed
- False positives: Some patterns (like example UUIDs) may flag incorrectly
- Requires context for personal content: Without the context file, personal names/terms won't be detected
- No semantic understanding: Can't detect if content is "confidential" without explicit markers
Always do a final human review before sharing sensitive materials.
Quick Reference
| I want to... | Command |
|---|---|
| Scan a directory | /sensitive-content-scanner ./path/ |
| Scan a single file | /sensitive-content-scanner ./file.md |
| Quick summary only | /sensitive-content-scanner ./path/ --summary |
| Check my skills before export | /sensitive-content-scanner ~/.claude/skills/ |
| Set up personal detection | Create ~/.claude/sensitive-content-context.md |