data-science-notebooks
SKILL.md
Interactive Notebooks
Use this skill for creating reproducible, well-structured notebooks for data exploration, analysis, and communication.
When to use this skill
- Exploratory analysis — interactively investigate data
- Reproducible research — document methodology with code and results
- Teaching/demos — explain concepts with executable examples
- Stakeholder communication — share insights with narrative + visuals
- Prototyping — quickly iterate on data transformations or models
Tool selection
| Tool | Best For | Key Feature |
|---|---|---|
| JupyterLab | Traditional data science, extensions ecosystem | Full IDE experience |
| marimo | Reproducible notebooks, reactive execution | Python-native, version-control friendly |
| VS Code + Jupyter | IDE-native notebook experience | Intellisense, debugging, git integration |
| Google Colab | Cloud GPUs, sharing, collaboration | Free TPU/GPU, easy sharing |
Core principles
1) Structure for readability
# Title: Clear project/question description
## Setup
Imports and configuration
## Data Loading
Load and validate data
## Analysis
- Subsection per question/hypothesis
- Clear markdown explanations
- Visualizations with interpretations
## Conclusions
Key findings and next steps
2) Ensure reproducibility
# Set random seeds
import numpy as np
import random
np.random.seed(42)
random.seed(42)
# Pin versions in requirements.txt or environment.yml
# requirements.txt example:
# pandas==2.1.0
# scikit-learn==1.3.0
3) Keep cells focused
- One concept per cell
- Avoid cells with >50 lines
- Refactor helper functions to
.pyfiles
4) Never hardcode secrets
# ✅ Use environment variables
import os
api_key = os.environ.get("OPENAI_API_KEY")
# ❌ Never do this
api_key = "sk-abc123..."
Jupyter best practices
Magic commands (Jupyter/IPython)
# In a Jupyter cell (these are IPython magics, not standard Python)
# Auto-reload modules during development
# %load_ext autoreload
# %autoreload 2
# Timing
# %timeit function_call()
# Debugging
# %debug
# Environment info (requires watermark package)
# %watermark -v -m -p numpy,pandas,sklearn
Clean outputs before git
# Using nbstripout
pip install nbstripout
nbstripout --install
# Or pre-commit hook
pip install pre-commit
pre-commit install
marimo advantages
Reactive execution
# marimo notebook - cells auto-recompute when dependencies change
import marimo as mo
slider = mo.ui.slider(1, 100, value=50)
slider # Display the slider
# This cell re-runs automatically when slider changes
df_filtered = df[df['value'] > slider.value]
Version control friendly
- Pure Python (
.pyfiles) - No output blobs in git
- Readable diffs
Convert Jupyter to marimo
marimo convert notebook.ipynb -o notebook.py
Common anti-patterns
- ❌ Running cells out of order (Jupyter)
- ❌ Giant cells with mixed concerns
- ❌ Hardcoded file paths
- ❌ No markdown explanations
- ❌ Committing large output files
- ❌ Inline data (use data/ folder)
Progressive disclosure
../references/jupyter-advanced.md— Widgets, extensions, debugging../references/marimo-guide.md— Reactive patterns, UI components../references/notebook-testing.md— Unit tests for notebook code../references/sharing-publishing.md— nbconvert, Quarto, Voilà
Related skills
@data-science-eda— Exploration patterns for notebooks@data-science-interactive-apps— Convert notebooks to apps@data-engineering-core— Production-ready code patterns
References
- Jupyter Documentation
- marimo Documentation
- nbstripout
- Quarto (publishing)
Weekly Installs
9
Repository
legout/data-pla…t-skillsFirst Seen
Feb 11, 2026
Security Audits
Installed on
opencode7
gemini-cli7
github-copilot7
codex7
kimi-cli7
amp7