working-in-notebooks
SKILL.md
Working in Notebooks
Use this skill to create, maintain, and choose between notebook environments (Jupyter, marimo, Colab) for data work. Covers tool selection, reproducibility patterns, and workflow best practices.
When to use this skill
- Setting up a notebook environment — choosing between Jupyter, marimo, VS Code, or Colab
- Converting between notebook formats — Jupyter to marimo, .ipynb to .py, or vice versa
- Making notebooks reproducible — pinning dependencies, managing random seeds, avoiding hardcoded paths
- Improving notebook structure — organizing cells, refactoring code, adding tests
- Publishing or sharing notebooks — nbconvert, Quarto, Voilà, or Git workflows
- Jupyter-specific features — magic commands, widgets, extensions, kernel management
- Marimo-specific workflows — reactive execution, UI components, version control patterns
When NOT to use this skill
Use a different skill for these related but distinct tasks:
| Instead of... | Use this skill | Because... |
|---|---|---|
| Building a stakeholder-facing dashboard | building-data-apps |
Apps are for external users; notebooks are for analysts/developers |
| Creating interactive data explorers for non-technical users | building-data-apps |
Streamlit, Panel, Gradio are purpose-built for this |
| Exploratory data analysis patterns | analyzing-data |
EDA patterns (profiling, statistical tests) belong there |
| Visualization library selection | analyzing-data |
Chart types and library comparison is covered there |
| Production ML feature engineering | engineering-ml-features |
Feature engineering logic is domain-specific |
| Model evaluation and cross-validation | evaluating-ml-models |
Model comparison and metrics belong there |
Quick boundary check
- Notebook = code + markdown + outputs in cells, run interactively, often .ipynb or .py format
- Data app = deployed web interface with widgets, for non-coders to interact with
- If the user asks for a "dashboard," "app," or mentions "users clicking buttons," use
building-data-apps - If the user asks for "notebook," "Jupyter," "marimo," or "explore data interactively," use this skill
Tool selection guide
Quick decision checklist
| Question | If yes, consider |
|---|---|
| Need reactive execution (cells auto-update)? | marimo |
| Want pure Python files for version control? | marimo |
| Need specific Jupyter extensions ecosystem? | JupyterLab |
| Using Google Colab features (TPU, shared GPUs)? | Google Colab |
| Want IDE-native experience (IntelliSense, debugger)? | VS Code + Jupyter |
| Converting from existing .ipynb files? | Jupyter → marimo with marimo convert |
| Teaching beginners (familiarity matters)? | JupyterLab or Colab |
Tool comparison
| Tool | Best For | Key Feature | File Format |
|---|---|---|---|
| JupyterLab | Traditional data science, rich extensions | Full IDE experience, 1000+ extensions | .ipynb (JSON) |
| marimo | Reproducible notebooks, reactive execution | Python-native, version-control friendly | .py (pure Python) |
| VS Code + Jupyter | IDE-native notebook experience | IntelliSense, debugging, git integration | .ipynb |
| Google Colab | Cloud GPUs, easy sharing, collaboration | Free TPU/GPU, zero setup | .ipynb (cloud) |
Core workflow: Creating a reproducible notebook
Step 1: Choose your tool
See decision checklist above. If starting fresh and reproducibility matters → marimo. If ecosystem/extensions matter → JupyterLab.
Step 2: Set up the environment
# Cell 1: Environment setup (run first)
# Set random seeds for reproducibility
import numpy as np
import random
np.random.seed(42)
random.seed(42)
# For torch users:
# import torch
# torch.manual_seed(42)
Step 3: Pin dependencies
Create requirements.txt or environment.yml:
# requirements.txt
pandas==2.1.0
numpy==1.24.0
matplotlib==3.7.0
Or use modern tools:
# With uv
uv pip freeze > requirements.txt
# With poetry
poetry export -f requirements.txt > requirements.txt
Step 4: Structure for readability
# Title: Clear project/question description
## Setup
Imports and configuration
## Data Loading
Load and validate data
## Analysis
- Subsection per question/hypothesis
- Clear markdown explanations
- Visualizations with interpretations
## Conclusions
Key findings and next steps
Step 5: Never hardcode secrets
# ✅ Use environment variables
import os
api_key = os.environ.get("OPENAI_API_KEY")
# ❌ Never do this
api_key = "sk-abc123..."
Step 6: Clean outputs before git (Jupyter)
# Install nbstripout
pip install nbstripout
nbstripout --install
# Or use pre-commit
pip install pre-commit
pre-commit install
Validation and feedback loop
Self-check questions
Before considering a notebook "done":
- Can someone else run this from a fresh environment?
- Are all random seeds set?
- Are dependencies pinned (requirements.txt or similar)?
- Are secrets loaded from environment variables?
- Are cells organized logically (not execution-order dependent)?
- Are helper functions extracted to
.pyfiles if >30 lines? - Are outputs stripped before committing (if using Jupyter)?
Testing notebook code
See ../analyzing-data/references/notebook-testing.md for:
- Unit tests for notebook code
- nbval for output validation
- Papermill for parameterized execution
Progressive disclosure
Core references
references/jupyter-guide.md— Jupyter/JupyterLab deep dive: magic commands, widgets, extensions, kernel managementreferences/marimo-guide.md— marimo deep dive: reactive execution, UI components, migration from Jupyterreferences/reproducibility-patterns.md— Environment management, dependency pinning, nbstripout, secrets handling
Related references (in other skills)
../analyzing-data/references/notebook-testing.md— Unit tests, nbval, Papermill for notebook validation../analyzing-data/references/sharing-publishing.md— nbconvert, Quarto, Voilà for publishing notebooks
External resources
Common anti-patterns
- ❌ Running cells out of order (Jupyter) → Use "Run All" to verify, or switch to marimo
- ❌ Giant cells with mixed concerns → One concept per cell, <50 lines
- ❌ Hardcoded file paths → Use relative paths or environment variables
- ❌ Hardcoded secrets → Load from environment
- ❌ Committing large output files → Use .gitignore, data/ folder, or strip outputs
- ❌ Inline data → Use data/ folder or external sources
- ❌ No markdown explanations → Every code block deserves context
Quick commands reference
Jupyter
# Start JupyterLab
jupyter lab
# Convert notebook
jupyter nbconvert notebook.ipynb --to html
jupyter nbconvert notebook.ipynb --to script
# List kernels
jupyter kernelspec list
# Install kernel for virtual environment
python -m ipykernel install --user --name=myenv
marimo
# Create/edit a notebook
marimo edit notebook.py
# Run as app (read-only)
marimo run notebook.py
# Convert from Jupyter
marimo convert notebook.ipynb -o notebook.py
# Export to HTML
marimo export html notebook.py -o notebook.html
Environment validation
# Check installed versions
import pandas as pd
import numpy as np
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
Related skills
| Skill | Relationship | When to use |
|---|---|---|
analyzing-data |
Complementary | EDA patterns, profiling, statistical tests—use with notebooks |
building-data-apps |
Distinct boundary | Building stakeholder-facing dashboards—not this skill |
evaluating-ml-models |
Complementary | Cross-validation, metrics, experiment tracking |
engineering-ml-features |
Complementary | Feature engineering patterns and transformations |
Migration notes
This skill replaces data-science-notebooks with the following changes:
- Removed
dependsOnfrom frontmatter (non-standard field) - Added explicit when-to-use and when-not-to-use sections
- Split content into focused reference files
- Clear boundary documentation vs
building-data-apps - Progressive disclosure with direct file paths (no
@skillhybrid syntax)
Weekly Installs
1
Repository
legout/data-agent-skillsFirst Seen
3 days ago
Security Audits
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1