skills/kthorn/research-superpower/Cleaning Up Research Sessions

Cleaning Up Research Sessions

SKILL.md

Cleaning Up Research Sessions

Overview

Remove intermediate files created during research workflow while preserving all important data.

Core principle: Conservative cleanup with user confirmation. Never delete anything important.

When to Use

Use this skill when:

  • Research session is complete and consolidated
  • Preparing to archive or share research session folder
  • Research folder has accumulated temporary/intermediate files
  • User explicitly asks to clean up

When NOT to use:

  • Research is still in progress
  • User hasn't reviewed final outputs yet
  • Unsure what files are safe to delete

Files That Are ALWAYS KEPT

NEVER delete these (protected list):

Core outputs:

  • SUMMARY.md - Enhanced findings with methodology
  • relevant-papers.json - Filtered relevant papers
  • papers-reviewed.json - Complete screening history
  • papers/ directory - All PDFs and supplementary files
  • citations/citation-graph.json - Citation relationships

Methodology documentation:

  • screening-criteria.json - Rubric definition (if exists)
  • test-set.json - Rubric validation papers (if exists)
  • abstracts-cache.json - Cached abstracts for re-screening (if exists)
  • rubric-changelog.md - Rubric version history (if exists)

Auxiliary documentation (if exists):

  • README.md - Project overview
  • TOP_PRIORITY_PAPERS.md - Curated priority list
  • evaluated-papers.json - Rich structured data

Project configuration:

  • .claude/ directory - Permissions and settings
  • *.py helper scripts that were created - Keep for reproducibility

Files That May Be Cleaned Up

Candidates for removal (with confirmation):

Intermediate search results:

  • initial-search-results.json - Raw PubMed results before screening
    • Safe to delete: Data is in papers-reviewed.json
    • Reason to keep: Shows raw search results for reproducibility

Temporary files:

  • *.tmp files
  • *.swp files (vim swap files)
  • .DS_Store (macOS)
  • __pycache__/ (Python cache)
  • *.pyc (Python compiled)

Log files:

  • *.log files
  • debug-*.txt files

Cleanup Workflow

Step 1: Analyze Research Session

cd research-sessions/YYYY-MM-DD-description/

# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh

Identify files by category:

  • Core outputs (MUST keep)
  • Methodology files (SHOULD keep)
  • Intermediate files (candidates for cleanup)
  • Temporary files (safe to delete)

Step 2: Present Cleanup Plan to User

Show what will be deleted:

🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/

Files to KEEP (protected):
  βœ… SUMMARY.md (45 KB)
  βœ… relevant-papers.json (12 KB)
  βœ… papers-reviewed.json (28 KB)
  βœ… papers/ (14 PDFs, 32 MB)
  βœ… citations/citation-graph.json (5 KB)
  βœ… screening-criteria.json (2 KB)
  βœ… abstracts-cache.json (156 KB)

Files that CAN be removed (intermediate):
  πŸ—‘οΈ  initial-search-results.json (8 KB) - Raw PubMed results
  πŸ—‘οΈ  .DS_Store (6 KB) - macOS metadata

Total space to recover: 14 KB

Proceed with cleanup? (y/n/review)

Options:

  • y - Delete intermediate files
  • n - Cancel cleanup, keep everything
  • review - Show contents of each file before deciding

Step 3: Confirm Deletions

Before deleting ANY file:

  1. Verify it's not in protected list
  2. Check file isn't referenced in SUMMARY.md
  3. Confirm with user one more time

Example confirmation:

About to delete:
- initial-search-results.json (8 KB)

This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.

Confirm deletion? (y/n)

Step 4: Perform Cleanup

Delete confirmed files:

# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/

# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/

# Or use rm if user confirms
rm initial-search-results.json

Report results:

βœ… Cleanup complete!

Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

Space recovered: 14 KB

Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept

Step 5: Verify Integrity

After cleanup, verify critical files:

# Check core files exist
test -f SUMMARY.md && echo "βœ“ SUMMARY.md"
test -f relevant-papers.json && echo "βœ“ relevant-papers.json"
test -f papers-reviewed.json && echo "βœ“ papers-reviewed.json"
test -d papers && echo "βœ“ papers/ directory"

# Verify JSON files are valid
jq empty relevant-papers.json && echo "βœ“ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "βœ“ papers-reviewed.json valid JSON"

Report to user:

βœ… Integrity check passed
   - All core files present
   - All JSON files valid
   - All PDFs intact

Special Cases

Case 1: Large abstracts-cache.json

If abstracts-cache.json is very large (>100 MB):

⚠️  abstracts-cache.json is 256 MB

This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)

Choice? (1/2/3)

If user chooses compress:

gzip abstracts-cache.json
# Creates abstracts-cache.json.gz

echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"

Case 2: Helper Scripts

If user created helper scripts during research:

πŸ“ Found helper scripts:
   - screen_papers.py (created for batch screening)
   - deep_dive_papers.py (created for data extraction)

These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section

Keep scripts? (y/n)

Case 3: Multiple Research Sessions

If cleaning up multiple sessions:

# Find all research sessions
find research-sessions/ -maxdepth 1 -type d

# For each session:
for session in research-sessions/*/; do
    echo "Analyzing: $session"
    # Run cleanup analysis
done

Ask user:

Found 5 completed research sessions.

Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean

Safety Mechanisms

Protected File List

Maintain hardcoded list of patterns to NEVER delete:

PROTECTED_PATTERNS = [
    'SUMMARY.md',
    'relevant-papers.json',
    'papers-reviewed.json',
    'papers/*.pdf',
    'papers/*.zip',
    'citations/citation-graph.json',
    'screening-criteria.json',
    'test-set.json',
    'abstracts-cache.json',
    'rubric-changelog.md',
    'README.md',
    'TOP_PRIORITY_PAPERS.md',
    'evaluated-papers.json',
    '*.py',  # Helper scripts
    '.claude/*',  # Project settings
]

Before deleting any file:

def is_protected(filepath):
    """Check if file matches any protected pattern"""
    for pattern in PROTECTED_PATTERNS:
        if fnmatch(filepath, pattern):
            return True
    return False

# Never delete protected files
if is_protected(file_to_delete):
    print(f"⚠️  ERROR: {file_to_delete} is protected and cannot be deleted")
    return

Dry Run Mode

Always show what will be deleted before doing it:

# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"

for file in $candidate_files; do
    if is_safe_to_delete "$file"; then
        echo "Would delete: $file ($(du -h $file | cut -f1))"
    fi
done

echo ""
echo "Proceed with actual deletion? (y/n)"

Integration with Other Skills

After answering-research-questions workflow:

  1. Complete Phase 8 (consolidation)
  2. User reviews SUMMARY.md and relevant-papers.json
  3. Optionally: Run cleaning-up-research-sessions
  4. Archive or share research folder

Add to answering-research-questions Phase 8:

### Optional: Cleanup

After reviewing outputs, optionally clean up intermediate files:

"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."

If yes: Use `cleaning-up-research-sessions` skill

Common Mistakes

Deleting papers-reviewed.json: This is the deduplication database - NEVER delete β†’ Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening β†’ Ask user, default to keep Deleting helper scripts: Important for reproducibility β†’ Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan β†’ Always show dry run first Deleting without confirmation: Too risky β†’ Always ask for final confirmation Not verifying after cleanup: Files could be corrupted β†’ Run integrity check after deletion

Quick Reference

Action Command
List all files with sizes find . -type f -exec ls -lh {} \; | sort -rh
Check if JSON valid jq empty file.json
Move to trash (macOS) mv file ~/.Trash/
Move to trash (Linux) mv file ~/.local/share/Trash/files/
Compress large file gzip file
Calculate folder size du -sh .

Success Criteria

Cleanup successful when:

  • All protected files still exist and are valid
  • Only confirmed intermediate files deleted
  • User knows what was deleted and why
  • Space recovered is reported
  • Integrity check passed
  • Research session is still fully functional

Example Session

User: "Can you clean up the research session folder?"

Agent: "I'll analyze the research session and show you what can be safely cleaned up."

[Analyzes folder]

Agent: "🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/

Files to KEEP (protected):
  βœ… SUMMARY.md (45 KB) - Final findings
  βœ… relevant-papers.json (12 KB) - Filtered results
  βœ… papers-reviewed.json (28 KB) - Complete history
  βœ… papers/ (14 PDFs, 32 MB) - Downloaded papers
  βœ… citations/citation-graph.json (5 KB) - Citation network
  βœ… screening-criteria.json (2 KB) - Rubric definition
  βœ… abstracts-cache.json (156 KB) - Cached abstracts

Files that CAN be removed:
  πŸ—‘οΈ  initial-search-results.json (8 KB) - Raw PubMed results
     β†’ Data is preserved in papers-reviewed.json
  πŸ—‘οΈ  .DS_Store (6 KB) - macOS metadata
     β†’ Not needed

Total space to recover: 14 KB

Proceed with cleanup? (y/n/review)"

User: "y"

Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

All protected files will be preserved. Confirm? (y/n)"

User: "y"

Agent: [Deletes files]

"βœ… Cleanup complete!

Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

Space recovered: 14 KB

Running integrity check...
βœ… All core files intact
βœ… All JSON files valid
βœ… All 14 PDFs present

Research session is ready for archiving or sharing."

Next Steps

After cleanup:

  • Research folder is clean and ready to archive
  • Share folder with collaborators
  • Move to long-term storage
  • Continue with follow-up research if needed
Weekly Installs
0
GitHub Stars
29
First Seen
Jan 1, 1970