Project Sharing and Output Preparation

Expert guidance for preparing project outputs for sharing with collaborators, reviewers, or repositories. Creates organized packages at different sharing levels while preserving your working directory.

When to Use This Skill

Sharing analysis results with collaborators
Preparing supplementary materials for publications
Creating reproducible research packages
Archiving completed projects
Handoff to other researchers
Submitting to data repositories

Core Principles

Work on copies - Never modify the working directory
Choose appropriate level - Match sharing depth to audience needs
Document everything - Include clear guides and metadata
Clean before sharing - Remove debug code, clear outputs, anonymize if needed
Make it reproducible - Include dependencies and instructions
⚠️ CRITICAL: After creating sharing folder, all future work happens in the main project directory, NOT in the sharing folder - Sharing folders are read-only snapshots

Three Sharing Levels

Level 1: Summary Only

Purpose: Quick sharing for presentations, reports, or high-level review

What to include:

PDF export of final notebook(s)
Final data/results (CSV, Excel, figures) - optional
Brief README

Use when:

Sharing results with non-technical stakeholders
Presentations or talks
Quick review without reproduction needs
Space/time constraints

Structure:

shared-summary/
├── README.md                          # Brief overview
├── analysis-YYYY-MM-DD.pdf           # Notebook as PDF
└── results/
    ├── figures/
    │   ├── fig1-main-result.png
    │   └── fig2-comparison.png
    └── tables/
        └── summary-statistics.csv

Level 2: Reproducible

Purpose: Enable others to reproduce your analysis from processed data

What to include:

Analysis notebooks (.ipynb) - cleaned
Scripts for figure generation
Processed/analysis-ready data
Requirements file (requirements.txt or environment.yml)
Detailed README with instructions

Use when:

Sharing with collaborating researchers
Peer review / manuscript supplementary materials
Teaching or tutorials
Standard collaboration needs

Structure:

shared-reproducible/
├── README.md                          # Setup and reproduction instructions
├── MANIFEST.md                        # File descriptions
├── environment.yml                    # Conda environment OR requirements.txt
├── notebooks/
│   ├── 01-data-processing.ipynb      # Cleaned, outputs cleared
│   ├── 02-analysis.ipynb
│   └── 03-visualization.ipynb
├── scripts/
│   ├── generate_figures.py           # Standalone scripts
│   └── utils.py
└── data/
    ├── processed/
    │   ├── cleaned_data.csv
    │   └── processed_results.tsv
    └── README.md                      # Data provenance

Level 3: Full Traceability

Purpose: Complete transparency from raw data through all processing steps

What to include:

Starting/raw data
All processing scripts and notebooks
All intermediate files
Final results
Complete documentation
Full dependency specification

Use when:

Archiving for future reference
Regulatory compliance
High-stakes reproducibility (clinical, policy)
Data repository submission (Zenodo, Dryad, etc.)
Complete project handoff

Structure:

shared-complete/
├── README.md                          # Complete project guide
├── MANIFEST.md                        # Comprehensive file listing
├── environment.yml
├── data/
│   ├── raw/                          # Original, unmodified data
│   │   ├── sample_A_reads.fastq.gz
│   │   └── README.md                 # Data source, download date
│   ├── intermediate/                 # Processing steps
│   │   ├── 01-filtered/
│   │   ├── 02-aligned/
│   │   └── README.md
│   └── processed/                    # Final analysis-ready
│       └── final_dataset.csv
├── scripts/
│   ├── 01-download-data.sh
│   ├── 02-quality-control.py
│   ├── 03-filtering.py
│   ├── 04-analysis.py
│   └── utils/
├── notebooks/
│   ├── exploratory/                  # Early exploration
│   └── final/                        # Publication analyses
├── results/
│   ├── figures/
│   ├── tables/
│   └── supplementary/
└── documentation/
    ├── methods.md                    # Detailed methodology
    ├── changelog.md                  # Processing decisions
    └── data-dictionary.md            # Variable definitions

Preparation Workflow

Step 1: Ask User for Sharing Level

Questions to determine level:

Which sharing level do you need?

1. Summary Only - PDF + final results (quick sharing)
2. Reproducible - Notebooks + scripts + data (standard sharing)
3. Full Traceability - Everything from raw data (archival/compliance)

Additional questions:
- Who is the audience? (colleagues, reviewers, public)
- Are there size constraints?
- Any sensitive data to handle?
- Timeline for sharing?

Step 2: Identify Files to Include

For each level, identify:

Level 1 - Summary:

Main analysis notebook(s)
Key figures (publication-quality)
Summary tables/statistics
Optional: Final processed dataset

Level 2 - Reproducible:

All analysis notebooks (not exploratory)
Figure generation scripts
Processed/cleaned data
Environment specification
Any utility functions/modules

Level 3 - Full:

Raw data (or links if too large)
All processing scripts
All notebooks (including exploratory)
All intermediate files
Complete documentation

Step 3: Create Sharing Directory

# Create dated directory
SHARE_DIR="shared-$(date +%Y%m%d)-[level]"
mkdir -p "$SHARE_DIR"

# Create subdirectories based on level
# ... appropriate structure from above

Step 4: Copy and Clean Files

For notebooks (.ipynb):

import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def clean_notebook(input_path, output_path):
    """Clean notebook: clear outputs, remove debug cells."""

    # Read notebook
    with open(input_path, 'r') as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear_output = ClearOutputPreprocessor()
    nb, _ = clear_output.preprocess(nb, {})

    # Remove cells tagged as 'debug' or 'remove'
    nb.cells = [cell for cell in nb.cells
                if 'debug' not in cell.metadata.get('tags', [])
                and 'remove' not in cell.metadata.get('tags', [])]

    # Write cleaned notebook
    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

For data files:

Copy as-is for small files
Consider compression for large files
Check for sensitive information

For scripts:

Remove debugging code
Add docstrings if missing
Ensure paths are relative

Step 5: Generate Documentation

README.md Template

# Project: [Project Name]

**Date:** YYYY-MM-DD
**Author:** [Your Name]
**Sharing Level:** [Summary/Reproducible/Full]

## Overview

Brief description of the project and analysis.

## Contents

See MANIFEST.md for detailed file descriptions.

## Requirements

[For Reproducible/Full levels]
- Python 3.X
- See environment.yml for dependencies

## Setup

\`\`\`bash
# Create environment
conda env create -f environment.yml
conda activate project-name
\`\`\`

## Reproduction Steps

[For Reproducible/Full levels]

1. [Description of first step]
   \`\`\`bash
   jupyter notebook notebooks/01-analysis.ipynb
   \`\`\`

2. [Description of second step]

## Data Sources

[For Full level]
- Dataset A: [Source, download date, version]
- Dataset B: [Source, download date, version]

## Contact

[Your email or preferred contact]

## License

[If applicable - e.g., CC BY 4.0, MIT]

MANIFEST.md Template

# File Manifest

Generated: YYYY-MM-DD

## Directory Structure

\`\`\`
shared-YYYYMMDD/
├── README.md                  - Project overview and setup
├── MANIFEST.md               - This file
[... complete tree ...]
\`\`\`

## File Descriptions

### Notebooks

- \`notebooks/01-data-processing.ipynb\` - Initial data loading and cleaning
- \`notebooks/02-analysis.ipynb\` - Main statistical analysis
- \`notebooks/03-visualization.ipynb\` - Figure generation for publication

### Data

- \`data/processed/cleaned_data.csv\` - Quality-controlled dataset (N=XXX samples)
  - Columns: [list key columns]
  - Missing values handled by [method]

### Scripts

- \`scripts/generate_figures.py\` - Automated figure generation
  - Usage: \`python generate_figures.py --input data/processed/cleaned_data.csv\`

### Results

- \`results/figures/fig1-main.png\` - Main result showing [description]
- \`results/tables/summary_stats.csv\` - Descriptive statistics

[Continue for all files...]

Step 6: Handle Sensitive Data

Check for sensitive information:

Personal identifiable information (PII)
Access credentials (API keys, passwords)
Proprietary data
Institutional data with sharing restrictions
Patient/subject identifiers

Strategies:

Anonymize - Remove or hash identifiers
Exclude - Don't include sensitive files
Aggregate - Share summary statistics only
Document restrictions - Note what's excluded and why

Example anonymization:

import hashlib

def anonymize_ids(df, id_column='subject_id'):
    """Replace IDs with hashed values."""
    df[id_column] = df[id_column].apply(
        lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
    )
    return df

Step 7: Package and Compress

For smaller packages (<100MB):

# Create zip archive
zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/

For larger packages:

# Create tar.gz (better compression)
tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/

# Or split into parts if very large
tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part

Document package contents:

Total size
Number of files
Compression method
How to extract

Step 8: Return to Working Directory

⚠️ IMPORTANT: After creating the sharing package, always work in the main project directory.

The sharing folder is a snapshot for distribution only. Any future development, analysis, or modifications should happen in your original working directory, not in the shared-*/ folder.

Claude should:

Change directory back to main project: cd .. (if needed)
Confirm working directory: pwd
Continue all work in the original project location
Treat sharing folders as read-only archives

Example:

# After creating sharing package
cd /path/to/main/project  # Return to working directory
pwd                        # Verify location
# Continue work here, NOT in shared-YYYYMMDD/

Best Practices

Notebook Cleaning

Before sharing notebooks:

Clear all outputs

jupyter nbconvert --clear-output --inplace notebook.ipynb

Remove debug cells
- Tag cells for removal: Cell → Cell Tags → add "remove"
- Filter during copy
Add markdown explanations
- Ensure each code cell has context
- Add section headers
- Document assumptions
Check cell execution order
- Run "Restart & Run All" to verify
- Fix any out-of-order dependencies

Remove absolute paths

# ❌ Bad
data = pd.read_csv('/Users/yourname/project/data.csv')

# ✅ Good
data = pd.read_csv('../data/data.csv')
# or
from pathlib import Path
data_dir = Path(__file__).parent / 'data'

File Organization

Naming conventions for shared files:

Use descriptive names: telomere_analysis_results.csv not results.csv
Include dates for time-sensitive data: data_2024-01-15.csv
Version if applicable: analysis_v2.ipynb
No spaces: use - or _

Size considerations:

Document large files in README
Consider hosting large data separately (institutional storage, Zenodo)
Provide download links instead of including in package
Use .gitattributes for large file tracking if using Git

Documentation Requirements

Minimum documentation for each level:

Level 1 - Summary:

What the results show
Key findings
Date and author

Level 2 - Reproducible:

Setup instructions
How to run the analysis
Software dependencies
Expected runtime
Data source information

Level 3 - Full:

Complete methodology
All data sources with versions
Processing decisions and rationale
Known issues or limitations
Contact information

Dependency Management

Create requirements file:

For pip:

# From active environment
pip freeze > requirements.txt

# Or manually curated (better)
cat > requirements.txt << EOF
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
scipy>=1.9.0
EOF

For conda:

# Export current environment
conda env export > environment.yml

# Or minimal (recommended)
conda env export --from-history > environment.yml

# Then edit to remove build-specific details

Common Scenarios

Scenario 1: Sharing with Lab Collaborators

Level: Reproducible

Include:

Cleaned analysis notebooks
Processed data
Figure generation scripts
environment.yml
README with reproduction steps

Don't include:

Exploratory notebooks
Failed analysis attempts
Debug outputs
Personal notes

Scenario 2: Manuscript Supplementary Material

Level: Reproducible or Full (depending on journal)

Include:

All notebooks used for figures in paper
Scripts for each figure panel
Processed data (or instructions to obtain)
Complete environment specification
Detailed methods document

Best practices:

Number notebooks to match paper sections
Export key figures in publication formats (PDF, high-res PNG)
Include data dictionary for all variables
Test reproduction on clean environment

Scenario 3: Project Archival

Level: Full Traceability

Include:

Complete data pipeline from raw to processed
All versions of analysis
Meeting notes or decision logs
External tool versions
System information

Organization tips:

Use dates in directory names
Keep chronological changelog
Document all external dependencies
Include contact info for questions

Scenario 4: Data Repository Submission (Zenodo, Figshare)

Level: Full Traceability

Additional considerations:

Add LICENSE file (CC BY 4.0, MIT, etc.)
Include CITATION.cff or CITATION.txt
Comprehensive metadata
README with DOI/reference instructions
Consider maximum file sizes
Review repository-specific guidelines

Quality Checklist

Before finalizing the sharing package:

File Quality

All notebooks run without errors
Notebook outputs cleared
No absolute paths in code
No hardcoded credentials or API keys
File sizes documented
Large files compressed or linked

Documentation

Reproducibility

Requirements file tested in clean environment
All data accessible (included or linked)
Scripts run in documented order
Expected outputs match actual outputs
Processing time documented

Privacy & Sensitivity

No sensitive data included
Identifiers anonymized if needed
Institutional policies checked
Collaborator permissions obtained

Organization

Clear directory structure
Consistent naming conventions
Files logically grouped
No duplicate files
No unnecessary files (cache, .DS_Store, etc.)

Integration with Other Skills

Works well with:

folder-organization - Ensures source project is well-organized before sharing
jupyter-notebook-analysis - Creates notebooks that are share-ready
managing-environments - Documents dependencies properly

Before using this skill:

Organize working directory (folder-organization)
Finalize analysis (jupyter-notebook-analysis)
Document environment (managing-environments)

After using this skill:

Test package in clean environment
Share via appropriate channel (email, repository, cloud storage)
Keep archived copy for reference

Example Scripts

Create Sharing Package Script

#!/usr/bin/env python3
"""Create sharing package for project."""

import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def create_sharing_package(level='reproducible', output_dir=None):
    """
    Create sharing package.

    Args:
        level: 'summary', 'reproducible', or 'full'
        output_dir: Output directory name (auto-generated if None)
    """

    # Create output directory
    if output_dir is None:
        output_dir = f"shared-{date.today():%Y%m%d}-{level}"

    share_path = Path(output_dir)
    share_path.mkdir(exist_ok=True)

    print(f"Creating {level} sharing package in {share_path}")

    # Create structure based on level
    if level == 'summary':
        create_summary_package(share_path)
    elif level == 'reproducible':
        create_reproducible_package(share_path)
    elif level == 'full':
        create_full_package(share_path)

    print(f"✓ Package created: {share_path}")
    print(f"  Review and compress: tar -czf {share_path}.tar.gz {share_path}")

def clean_notebook(input_path, output_path):
    """Clean notebook outputs and debug cells."""
    with open(input_path) as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear = ClearOutputPreprocessor()
    nb, _ = clear.preprocess(nb, {})

    # Remove debug cells
    nb.cells = [c for c in nb.cells
                if 'debug' not in c.metadata.get('tags', [])]

    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

# ... implement level-specific functions ...

if __name__ == '__main__':
    import sys
    level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible'
    create_sharing_package(level)

Summary

Key principles for project sharing:

🎯 Choose the right level - Match sharing depth to audience needs
📋 Copy, don't move - Preserve your working directory
🧹 Clean thoroughly - Remove debug code, clear outputs
📝 Document everything - README + MANIFEST minimum
🔒 Check sensitivity - Anonymize or exclude as needed
✅ Test before sharing - Run in clean environment
📦 Package properly - Compress and document contents
⚠️ Work in main directory - After creating sharing package, ALL future work happens in the original project directory, NOT in the sharing folder

Remember: Good sharing practices benefit both collaborators and your future self!

⚠️ Critical Reminder for Claude

After creating any sharing package:

Always return to the main project directory
Never work in shared-*/ directories - These are read-only snapshots
All future edits, analysis, and development happen in the original working directory
Sharing folders are for distribution only, not active development

If the user asks to modify files, always check the current directory and ensure you're working in the main project location, not in a sharing package.

project-sharing

Project Sharing and Output Preparation

When to Use This Skill

Core Principles

Three Sharing Levels

Level 1: Summary Only

Level 2: Reproducible

Level 3: Full Traceability

Preparation Workflow

Step 1: Ask User for Sharing Level

Step 2: Identify Files to Include

Step 3: Create Sharing Directory

Step 4: Copy and Clean Files

Step 5: Generate Documentation

README.md Template

MANIFEST.md Template

Step 6: Handle Sensitive Data

Step 7: Package and Compress

Step 8: Return to Working Directory

Best Practices

Notebook Cleaning

File Organization

Documentation Requirements

Dependency Management

Common Scenarios

Scenario 1: Sharing with Lab Collaborators

Scenario 2: Manuscript Supplementary Material

Scenario 3: Project Archival

Scenario 4: Data Repository Submission (Zenodo, Figshare)

Quality Checklist

File Quality

Documentation

Reproducibility

Privacy & Sensitivity

Organization

Integration with Other Skills

Example Scripts

Create Sharing Package Script

Summary

⚠️ Critical Reminder for Claude