data-science-notebooks

Installation

SKILL.md

Interactive Notebooks

Use this skill for creating reproducible, well-structured notebooks for data exploration, analysis, and communication.

When to use this skill

Exploratory analysis — interactively investigate data
Reproducible research — document methodology with code and results
Teaching/demos — explain concepts with executable examples
Stakeholder communication — share insights with narrative + visuals
Prototyping — quickly iterate on data transformations or models

Tool selection

Tool	Best For	Key Feature
JupyterLab	Traditional data science, extensions ecosystem	Full IDE experience
marimo	Reproducible notebooks, reactive execution	Python-native, version-control friendly
VS Code + Jupyter	IDE-native notebook experience	Intellisense, debugging, git integration
Google Colab	Cloud GPUs, sharing, collaboration	Free TPU/GPU, easy sharing

Core principles

1) Structure for readability

# Title: Clear project/question description

## Setup
Imports and configuration

## Data Loading
Load and validate data

## Analysis
- Subsection per question/hypothesis
- Clear markdown explanations
- Visualizations with interpretations

## Conclusions
Key findings and next steps

2) Ensure reproducibility

# Set random seeds
import numpy as np
import random

np.random.seed(42)
random.seed(42)

# Pin versions in requirements.txt or environment.yml
# requirements.txt example:
# pandas==2.1.0
# scikit-learn==1.3.0

3) Keep cells focused

One concept per cell
Avoid cells with >50 lines
Refactor helper functions to .py files

4) Never hardcode secrets

# ✅ Use environment variables
import os

api_key = os.environ.get("OPENAI_API_KEY")

# ❌ Never do this
api_key = "sk-abc123..."

Jupyter best practices

Magic commands (Jupyter/IPython)

# In a Jupyter cell (these are IPython magics, not standard Python)
# Auto-reload modules during development
# %load_ext autoreload
# %autoreload 2

# Timing
# %timeit function_call()

# Debugging
# %debug

# Environment info (requires watermark package)
# %watermark -v -m -p numpy,pandas,sklearn

Clean outputs before git

# Using nbstripout
pip install nbstripout
nbstripout --install

# Or pre-commit hook
pip install pre-commit
pre-commit install

marimo advantages

Reactive execution

# marimo notebook - cells auto-recompute when dependencies change
import marimo as mo

slider = mo.ui.slider(1, 100, value=50)
slider  # Display the slider

# This cell re-runs automatically when slider changes
df_filtered = df[df['value'] > slider.value]

Version control friendly

Pure Python (.py files)
No output blobs in git
Readable diffs

Convert Jupyter to marimo

marimo convert notebook.ipynb -o notebook.py

Common anti-patterns

❌ Running cells out of order (Jupyter)
❌ Giant cells with mixed concerns
❌ Hardcoded file paths
❌ No markdown explanations
❌ Committing large output files
❌ Inline data (use data/ folder)

Progressive disclosure

../references/jupyter-advanced.md — Widgets, extensions, debugging
../references/marimo-guide.md — Reactive patterns, UI components
../references/notebook-testing.md — Unit tests for notebook code
../references/sharing-publishing.md — nbconvert, Quarto, Voilà

Related skills

@data-science-eda — Exploration patterns for notebooks
@data-science-interactive-apps — Convert notebooks to apps
@data-engineering-core — Production-ready code patterns

References

Related skills

More from legout/data-agent-skills

Installs

Repository

legout/data-agent-skills

First Seen

Mar 1, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

data-science-notebooks

Interactive Notebooks

When to use this skill

Tool selection

Core principles

1) Structure for readability

2) Ensure reproducibility

3) Keep cells focused

4) Never hardcode secrets

Jupyter best practices

Magic commands (Jupyter/IPython)

Clean outputs before git

marimo advantages

Reactive execution

Version control friendly

Convert Jupyter to marimo

Common anti-patterns

Progressive disclosure

Related skills

References

More from legout/data-agent-skills

data-engineering

data-engineering-storage-remote-access-libraries-obstore

data-engineering-storage-remote-access-integrations-iceberg

data-science-eda

data-engineering-storage-remote-access-libraries-fsspec

data-engineering-storage-remote-access-integrations-duckdb