remove-similar-image
SKILL.md
Remove Similar Image
Core Goal
- Scan one local image or a directory tree of local images.
- Use ImageHash to cluster exact duplicates and near-duplicates.
- Use OpenCV variance-of-Laplacian blur scoring to flag blurry shots.
- Preview actions first, then permanently delete or move candidates to a trash folder when requested.
Required Script
- Use
scripts/remove_similar_images.py. - Start with
doctorif dependency availability is unknown. - Treat
analyzewithout--applyas the safe default. - Prefer
--trash-dirbefore permanent deletion so the user can review results.
Dependency
- This skill requires Pillow, ImageHash, numpy, and OpenCV:
python3 -m pip install Pillow ImageHash numpy opencv-python-headless
- If
doctorreports missing dependencies, stop and surface the install command instead of pretending the scan ran.
Workflow
- Check dependencies:
python3 scripts/remove_similar_images.py doctor
- Preview similar groups and blurry images for a folder:
python3 scripts/remove_similar_images.py analyze \
--input-path /path/to/photos
- Preview safe cleanup by moving similar non-keepers and blurry files into a trash folder:
python3 scripts/remove_similar_images.py analyze \
--input-path /path/to/photos \
--delete-similar \
--delete-blurry \
--trash-dir /path/to/review-trash
- Apply the move after the preview looks correct:
python3 scripts/remove_similar_images.py analyze \
--input-path /path/to/photos \
--delete-similar \
--delete-blurry \
--trash-dir /path/to/review-trash \
--apply
- Permanently delete only similar non-keepers:
python3 scripts/remove_similar_images.py analyze \
--input-path /path/to/photos \
--delete-similar \
--apply
Main Arguments
--input-path: source image file or directory.--no-recursive: scan only the top-level directory.--extra-extension: include additional suffixes not covered by default.--limit: cap the number of scanned files for quick tests.--hash-method:phash,dhash,ahash, orwhash.--hash-size: larger hashes are stricter and slower.--similar-threshold: maximum Hamming distance considered similar.--blur-threshold: Laplacian-variance cutoff for blurry images.--keep-policy: choose the keeper in each similar group withbest,largest,newest, oroldest.--delete-similar: mark non-keeper files in similar groups as removal candidates.--delete-blurry: mark blurry files as removal candidates even when they are unique.--trash-dir: move files into a review directory instead of permanently deleting.--apply: execute removals or moves. Without this flag the script only reports.--report-json: save a machine-readable report for later review.--print-json: print the full report as JSON to stdout.
Default Heuristics
- Default similarity detection uses
phashwithhash_size=8andsimilar-threshold=5. - Default blur detection uses a variance-of-Laplacian cutoff of
100.0. - Default
keep-policy=bestprefers non-blurry images, then sharper images, then larger images. - Similar groups are connected components: if
Ais close toB, andBis close toC, they are treated as one group even ifAandCare slightly farther apart.
Usage Notes
- Review the preview before adding
--apply. - Use
--trash-dirfor the first pass on any valuable photo collection. - Lower
--similar-thresholdto be stricter. Raise it when near-duplicates are being missed. - Lower
--blur-thresholdif too many acceptable images are marked blurry. Raise it when obvious blur is missed. - Expect format support to follow Pillow and OpenCV availability in the local environment.
Output
- Text mode prints scan counts, similar groups, blurry images, and planned or applied actions.
- JSON mode includes per-image metadata, unreadable files, similar groups, planned actions, and action results.
Script
scripts/remove_similar_images.py
Weekly Installs
1
Repository
tiangong-ai/skillsGitHub Stars
4
First Seen
1 day ago
Security Audits
Installed on
openclaw1