git-leak-recovery
Git Leak Recovery
Overview
This skill guides the process of recovering sensitive data (secrets, credentials, API keys) that have been removed from Git history through history-rewriting operations, extracting the data, and then securely cleaning the repository to ensure complete removal.
Key Concepts
Git Object Persistence
When commits are "removed" via operations like git reset, git rebase, or git commit --amend, the underlying Git objects are not immediately deleted. They become "unreachable" but persist in the repository until garbage collection occurs. This behavior enables recovery but also means secrets remain accessible until explicit cleanup.
Common Hiding Places for Secrets
When searching for removed secrets, check these locations in order of likelihood:
- Reflog - Most common location for rewritten history (
git reflog) - Dangling commits - Commits with no branch reference (
git fsck --unreachable) - Stashes - Often overlooked (
git stash list) - Other branches - May contain the original commits
- Tags - May reference old commits
- Git notes - Annotations attached to commits
Workflow
Phase 1: Reconnaissance
Before attempting recovery, gather information about the repository state:
# View current commit history
git log --all --oneline
# Check reflog for all references
git reflog show --all
# Find unreachable objects
git fsck --unreachable
# List stashes
git stash list
# List all branches
git branch -a
# List tags
git tag -l
Phase 2: Recovery
Once a target commit is identified:
# View commit contents without checking out
git show <commit-hash>
# View specific file from commit
git show <commit-hash>:<path/to/file>
# Extract file to working directory
git show <commit-hash>:<path/to/file> > recovered_file.txt
Phase 3: Cleanup
To completely remove secrets from the repository, perform cleanup in this specific order:
# Step 1: Expire all reflog entries
git reflog expire --expire=now --all
# Step 2: Run aggressive garbage collection
git gc --prune=now --aggressive
The order matters: reflog must be expired first, otherwise GC will not remove the objects since they are still referenced.
Phase 4: Verification
Verify cleanup was successful using multiple approaches:
# Attempt to access the old commit (should fail)
git show <old-commit-hash>
# Search for secret patterns in repository
grep -r "secret_pattern" . .git 2>/dev/null
# Check for unreachable objects
git fsck --unreachable
# Count loose objects (should decrease after GC)
find .git/objects -type f | wc -l
# Verify working tree is clean
git status
Verification Strategies
Confirming Recovery Success
- Verify the recovered data matches expected format/content
- Write recovered data to the designated output location
- Confirm existing commits and history remain intact after recovery
Confirming Cleanup Success
- Old commit hash should return error when accessed via
git show grep -rfor secret patterns should return no matchesgit fsck --unreachableshould show no objects containing the secret- Compare object count before and after GC to confirm removal
Common Pitfalls
Investigation Pitfalls
- Only checking recent history - Use
git reflog show --allnot justgit reflogto see all references - Forgetting stashes - Stashes are a common place for accidentally stored secrets
- Missing other branches - Always check all branches with
git branch -a
Cleanup Pitfalls
- Wrong order of operations - Always expire reflog before running GC
- Missing the
--allflag -git reflog expire --expire=nowwithout--allonly affects HEAD - Using
--prunewithout=now- Default prune time is 2 weeks, use--prune=nowfor immediate effect - Not using
--aggressive- Standard GC may not remove all unreachable objects
Verification Pitfalls
- Only checking working directory - Secrets in
.gitdirectory require explicit checks - Not verifying object removal - Always confirm the commit hash is inaccessible
- Incomplete grep patterns - Search for multiple variations of the secret pattern
Decision Tree
Is the task about recovering lost data from Git?
├── Yes → Check reflog first (git reflog show --all)
│ ├── Found in reflog → Use git show <hash> to view/extract
│ └── Not in reflog → Check fsck, stashes, branches, tags
│
└── Is the task about cleaning up secrets from Git?
├── Yes → Follow cleanup sequence:
│ 1. git reflog expire --expire=now --all
│ 2. git gc --prune=now --aggressive
│ 3. Verify with multiple methods
│
└── Both recovery AND cleanup needed?
→ Complete recovery first, verify data saved,
then proceed with cleanup
Important Considerations
- Backup first: Before any cleanup operations, ensure recovered data is saved outside the repository
- Remote repositories: This cleanup only affects the local repository; if secrets were pushed to a remote, additional steps are needed
- Cloned copies: Any cloned copies of the repository may still contain the secrets
- Credential rotation: After recovering exposed secrets, rotate them immediately regardless of cleanup success