NYC
skills/letta-ai/skills/sanitize-git-repo

sanitize-git-repo

SKILL.md

Sanitize Git Repository

This skill provides guidance for systematically identifying and replacing sensitive information in git repositories, including API keys, tokens, passwords, and other credentials.

When to Use This Skill

  • Sanitizing a repository before sharing or open-sourcing
  • Removing accidentally committed secrets from a codebase
  • Replacing hardcoded credentials with placeholders
  • Auditing a repository for sensitive information
  • Preparing code for security review

Recommended Approach

Phase 1: Comprehensive Discovery

Build a complete inventory of all sensitive values before making any changes. This prevents the common mistake of discovering additional secrets mid-process.

Common Secret Patterns to Search:

Type Pattern/Prefix Example
AWS Access Keys AKIA[A-Z0-9]{16} AKIAIOSFODNN7EXAMPLE
AWS Secret Keys 40-char base64 strings Often near access keys
GitHub Tokens ghp_, gho_, ghs_, ghr_ ghp_xxxxxxxxxxxx
Huggingface Tokens hf_ hf_xxxxxxxxxxxx
Generic API Keys api[_-]?key, apikey Varies
Bearer Tokens bearer, token Varies
Private Keys -----BEGIN.*PRIVATE KEY----- RSA/EC keys
Database URLs postgres://, mysql://, mongodb:// Connection strings
Password Fields password, passwd, pwd Varies

Discovery Strategy:

  1. Search for known prefixes and patterns using regex
  2. Search for common variable names (API_KEY, SECRET, TOKEN, PASSWORD, CREDENTIAL)
  3. Check configuration files (.env, config.*, settings.*, *.json, *.yaml, *.yml)
  4. Examine test fixtures and mock data files
  5. Check embedded data within other formats (JSON encoded in strings, base64 encoded content)

Important Locations to Check:

  • Configuration files and environment templates
  • Test fixtures and sample data
  • Documentation and README files
  • CI/CD configuration files
  • Docker and deployment configurations
  • JSON/YAML data files (secrets may be embedded in structured data)
  • Git diff outputs or patch files stored in the repository

Phase 2: Inventory Documentation

Before making changes, create a documented list of:

  1. Each unique sensitive value found
  2. All file locations where each value appears
  3. The exact string to match (including surrounding context if needed for uniqueness)
  4. The placeholder to use for replacement

Placeholder Conventions:

  • Use descriptive placeholders that indicate the type: <AWS_ACCESS_KEY>, <GITHUB_TOKEN>, <DATABASE_PASSWORD>
  • Maintain consistency across all replacements
  • Match the format specified by the user if provided

Phase 3: Systematic Replacement

Execute replacements methodically:

  1. Work through the inventory one secret at a time
  2. For each secret, replace ALL occurrences across ALL files
  3. Verify each replacement immediately after making it
  4. Use exact string matching to avoid unintended modifications

Replacement Best Practices:

  • Read files before editing to ensure exact string matching
  • Handle whitespace and formatting precisely
  • Consider secrets embedded in JSON or other structured formats (may require escaping)
  • Use batch replacements when the same value appears multiple times in one file

Phase 4: Verification

After all replacements:

  1. Re-run all original discovery searches to confirm no secrets remain
  2. Search for partial matches of sensitive values
  3. Run any provided test suites to validate the sanitization
  4. Check that placeholders are properly formatted

Common Pitfalls

1. Incomplete Initial Discovery

Problem: Secrets discovered incrementally during the replacement process, requiring backtracking.

Prevention: Invest time upfront in comprehensive searching using multiple patterns and checking all file types, including data files and embedded content.

2. Secrets in Unexpected Locations

Problem: Secrets appear in JSON files, embedded strings, test fixtures, or encoded data that aren't found by simple searches.

Prevention: Search recursively through all file types. Check JSON and YAML files specifically. Look for base64-encoded content that might contain secrets.

3. Exact String Matching Failures

Problem: Edit operations fail due to whitespace differences or character encoding issues.

Prevention: Always read the target file first to understand exact formatting. Copy strings exactly as they appear in the file, including whitespace.

4. Missing Occurrences

Problem: Multiple occurrences of the same secret exist, but only some are replaced.

Prevention: After each replacement, immediately verify by searching for the original value again. Use replace-all functionality when available.

5. Git History Considerations

Problem: Secrets remain in git history even after being removed from current files.

Prevention: Clarify with the user whether git history sanitization is required. If so, tools like git filter-branch or BFG Repo-Cleaner may be needed (this is a separate, more complex operation).

6. Tool Availability Assumptions

Problem: Specialized search tools (like ripgrep) may not be available in all environments.

Prevention: Have fallback approaches ready. Standard grep -r works in most environments. Test tool availability before relying on it.

Verification Checklist

Before considering the task complete:

  • All known secret patterns have been searched for
  • Configuration files have been examined
  • JSON/YAML data files have been checked
  • Test fixtures and mock data have been reviewed
  • Each identified secret has been replaced in all locations
  • Verification searches confirm no original secrets remain
  • Placeholders follow a consistent format
  • Any provided tests pass successfully

Search Commands Reference

Standard grep (widely available):

# Search for AWS access keys
grep -rn "AKIA[A-Z0-9]\{16\}" . --exclude-dir=.git

# Search for common token prefixes
grep -rn "ghp_\|gho_\|ghs_\|hf_" . --exclude-dir=.git

# Search for password-related strings
grep -rn -i "password\|passwd\|pwd" . --exclude-dir=.git

# Search for API key patterns
grep -rn -i "api[_-]\?key\|apikey" . --exclude-dir=.git

Always exclude the .git directory from searches to avoid noise from git internals.

Weekly Installs
19
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
claude-code14
gemini-cli13
codex13
opencode13
antigravity12
github-copilot10