folder-organization
SKILL.md
Folder Organization Best Practices
Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.
When to Use This Skill
- Setting up new projects
- Reorganizing existing projects
- Establishing team conventions
- Creating reproducible research structures
- Managing data-intensive projects
Core Principles
- Predictability - Standard locations for common file types
- Scalability - Structure grows gracefully with project
- Discoverability - Easy for others (and future you) to navigate
- Separation of Concerns - Code, data, documentation, outputs separated
- Version Control Friendly - Large/generated files excluded appropriately
Standard Project Structure
Research/Analysis Projects
project-name/
├── README.md # Project overview and getting started
├── .gitignore # Exclude data, outputs, env files
├── environment.yml # Conda environment (or requirements.txt)
├── data/ # Input data (often gitignored)
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data
├── notebooks/ # Jupyter notebooks for exploration
│ ├── 01-exploration.ipynb
│ ├── 02-analysis.ipynb
│ └── figures/ # Notebook-generated figures
├── src/ # Source code (reusable modules)
│ ├── __init__.py
│ ├── data_processing.py
│ ├── analysis.py
│ └── visualization.py
├── scripts/ # Standalone scripts and workflows
│ ├── download_data.sh
│ └── run_pipeline.py
├── tests/ # Unit tests
│ └── test_analysis.py
├── docs/ # Documentation
│ ├── methods.md
│ └── references.md
├── results/ # Analysis outputs (gitignored)
│ ├── figures/
│ ├── tables/
│ └── models/
└── config/ # Configuration files
└── analysis_config.yaml
Development Projects
project-name/
├── README.md
├── .gitignore
├── setup.py # Package configuration
├── requirements.txt # or pyproject.toml
├── src/
│ └── package_name/
│ ├── __init__.py
│ ├── core.py
│ └── utils.py
├── tests/
│ ├── test_core.py
│ └── test_utils.py
├── docs/
│ ├── api.md
│ └── usage.md
├── examples/ # Example usage
│ └── example_workflow.py
└── .github/ # CI/CD workflows
└── workflows/
└── tests.yml
Bioinformatics/Workflow Projects
project-name/
├── README.md
├── data/
│ ├── raw/ # Raw sequencing data
│ ├── reference/ # Reference genomes, annotations
│ └── processed/ # Workflow outputs
├── workflows/ # Galaxy .ga or Snakemake files
│ ├── preprocessing.ga
│ └── assembly.ga
├── config/
│ ├── workflow_params.yaml
│ └── sample_sheet.tsv
├── scripts/ # Helper scripts
│ ├── submit_workflow.py
│ └── quality_check.py
├── results/ # Final outputs
│ ├── figures/
│ ├── tables/
│ └── reports/
└── logs/ # Workflow execution logs
File Naming Conventions
General Rules
-
Use lowercase with hyphens or underscores
- ✅
data-analysis.pyordata_analysis.py - ❌
DataAnalysis.pyordata analysis.py
- ✅
-
Be descriptive but concise
- ✅
process-telomere-data.py - ❌
script.pyorprocess_all_the_telomere_sequencing_data_from_experiments.py
- ✅
-
Use consistent separators
- Choose either hyphens or underscores and stick with it
- Convention: hyphens for file names, underscores for Python modules
-
Include version/date for important outputs
- ✅
report-2026-01-23.pdformodel-v2.pkl - ❌
report-final-final-v3.pdf
- ✅
Numbered Sequences
For sequential files (notebooks, scripts), use zero-padded numbers:
notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb
Data Files
Include metadata in filename when possible:
data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta
Directory Management Best Practices
What to Version Control
DO commit:
- Source code
- Documentation
- Configuration files
- Small test datasets (<1MB)
- Requirements/environment files
- README files
DON'T commit:
- Large data files (use
.gitignore) - Generated outputs
- Environment directories (
venv/,conda-env/) - Logs
- Temporary files
- API keys/secrets
.gitignore Template
# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/
# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints
# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz
# Outputs
results/
outputs/
*.png
*.pdf
*.html
# Logs
logs/
*.log
# Environment
.env
environment.local.yml
# OS
.DS_Store
Thumbs.db
Data Organization
Raw Data is Sacred
- Never modify raw data - Always keep originals untouched
- Store in
data/raw/and make it read-only if possible - Document data provenance (where it came from, when downloaded)
Processed Data Hierarchy
data/
├── raw/ # Original, immutable
├── interim/ # Intermediate processing steps
├── processed/ # Final, analysis-ready data
└── external/ # Third-party data
Documentation Standards
README.md Essentials
Every project should have a README with:
# Project Name
Brief description
## Installation
How to set up the environment
## Usage
How to run the analysis/code
## Project Structure
Brief overview of directories
## Data
Where data lives and how to access it
## Results
Where to find outputs
Code Documentation
- Docstrings for all functions/classes
- Comments for complex logic
- CHANGELOG.md for tracking changes
- TODO.md for tracking work (gitignored or removed before merge)
Common Anti-Patterns to Avoid
❌ Flat structure with everything in root
project/
├── script1.py
├── script2.py
├── data.csv
├── output1.png
├── output2.png
└── final_really_final_v3.xlsx
❌ Ambiguous naming
notebooks/
├── notebook1.ipynb
├── test.ipynb
├── analysis.ipynb
└── analysis_new.ipynb
❌ Mixed concerns
project/
├── src/
│ ├── analysis.py
│ ├── data.csv # Data in source code directory
│ └── figure1.png # Output in source code directory
Cleanup and Maintenance
Regular Maintenance Tasks
- Archive old branches - Delete merged feature branches
- Clean temp files - Remove
TODO.md,NOTES.mdfrom completed work - Update documentation - Keep README current with changes
- Review .gitignore - Ensure large files aren't tracked
- Organize notebooks - Rename/renumber as project evolves
End-of-Project Checklist
- README complete and accurate
- Code documented
- Tests passing
- Large files gitignored
- Working files removed (TODO.md, scratch notebooks)
- Final outputs in
results/ - Environment files current
- License added (if applicable)
Integration with Other Skills
This skill works well with:
- python-environment - Environment setup and management
- claude-collaboration - Team workflow best practices
- jupyter-notebook-analysis - Notebook organization standards
Templates and Tools
Quick Project Setup
# Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config
touch README.md .gitignore environment.yml
Cookiecutter Templates
Consider using cookiecutter for standardized project templates:
cookiecutter-data-science- Data science projectscookiecutter-research- Research projects- Custom team templates
References and Resources
Weekly Installs
29
Repository
delphine-l/claude_globalFirst Seen
Jan 24, 2026
Security Audits
Installed on
opencode22
codex20
claude-code19
gemini-cli19
github-copilot16
cursor14