data-science-notebooks

SKILL.md

Interactive Notebooks

Use this skill for creating reproducible, well-structured notebooks for data exploration, analysis, and communication.

When to use this skill

  • Exploratory analysis — interactively investigate data
  • Reproducible research — document methodology with code and results
  • Teaching/demos — explain concepts with executable examples
  • Stakeholder communication — share insights with narrative + visuals
  • Prototyping — quickly iterate on data transformations or models

Tool selection

Tool Best For Key Feature
JupyterLab Traditional data science, extensions ecosystem Full IDE experience
marimo Reproducible notebooks, reactive execution Python-native, version-control friendly
VS Code + Jupyter IDE-native notebook experience Intellisense, debugging, git integration
Google Colab Cloud GPUs, sharing, collaboration Free TPU/GPU, easy sharing

Core principles

1) Structure for readability

# Title: Clear project/question description

## Setup
Imports and configuration

## Data Loading
Load and validate data

## Analysis
- Subsection per question/hypothesis
- Clear markdown explanations
- Visualizations with interpretations

## Conclusions
Key findings and next steps

2) Ensure reproducibility

# Set random seeds
import numpy as np
import random

np.random.seed(42)
random.seed(42)

# Pin versions in requirements.txt or environment.yml
# requirements.txt example:
# pandas==2.1.0
# scikit-learn==1.3.0

3) Keep cells focused

  • One concept per cell
  • Avoid cells with >50 lines
  • Refactor helper functions to .py files

4) Never hardcode secrets

# ✅ Use environment variables
import os

api_key = os.environ.get("OPENAI_API_KEY")

# ❌ Never do this
api_key = "sk-abc123..."

Jupyter best practices

Magic commands (Jupyter/IPython)

# In a Jupyter cell (these are IPython magics, not standard Python)
# Auto-reload modules during development
# %load_ext autoreload
# %autoreload 2

# Timing
# %timeit function_call()

# Debugging
# %debug

# Environment info (requires watermark package)
# %watermark -v -m -p numpy,pandas,sklearn

Clean outputs before git

# Using nbstripout
pip install nbstripout
nbstripout --install

# Or pre-commit hook
pip install pre-commit
pre-commit install

marimo advantages

Reactive execution

# marimo notebook - cells auto-recompute when dependencies change
import marimo as mo

slider = mo.ui.slider(1, 100, value=50)
slider  # Display the slider

# This cell re-runs automatically when slider changes
df_filtered = df[df['value'] > slider.value]

Version control friendly

  • Pure Python (.py files)
  • No output blobs in git
  • Readable diffs

Convert Jupyter to marimo

marimo convert notebook.ipynb -o notebook.py

Common anti-patterns

  • ❌ Running cells out of order (Jupyter)
  • ❌ Giant cells with mixed concerns
  • ❌ Hardcoded file paths
  • ❌ No markdown explanations
  • ❌ Committing large output files
  • ❌ Inline data (use data/ folder)

Progressive disclosure

  • ../references/jupyter-advanced.md — Widgets, extensions, debugging
  • ../references/marimo-guide.md — Reactive patterns, UI components
  • ../references/notebook-testing.md — Unit tests for notebook code
  • ../references/sharing-publishing.md — nbconvert, Quarto, Voilà

Related skills

  • @data-science-eda — Exploration patterns for notebooks
  • @data-science-interactive-apps — Convert notebooks to apps
  • @data-engineering-core — Production-ready code patterns

References

Weekly Installs
9
First Seen
Feb 11, 2026
Installed on
opencode7
gemini-cli7
github-copilot7
codex7
kimi-cli7
amp7