marimo-batch-mlflow
marimo-batch-mlflow
Opinionated fork of marimo-team/skills/marimo-batch that:
- Uses Tyro for CLI parsing (works with
dataclass,pydantic.BaseModel, orattrs) instead ofmo.cli_args()+ manual help-table rendering. - Uses MLflow + mlflow-widgets for experiment tracking instead of Weights and Biases.
- Keeps the dual-mode pattern (
mo.app_meta().mode == "script") so a singlenotebook.pyis both the UI for iteration and the entry point foruv run notebook.py --epochs 50batch jobs.
When to use this vs upstream marimo-batch
| Concern | upstream marimo-batch |
this skill |
|---|---|---|
| CLI parsing | mo.cli_args() + hand-rolled rich.Table for --help |
tyro.cli(ModelParams) — auto --help, type coercion, validation |
| Params model | Pydantic BaseModel |
dataclass (primary) or pydantic.BaseModel (alternative) |
| Tracking backend | Weights and Biases (wandb) |
MLflow (mlflow + optional mlflow-widgets for live charts) |
| Live training UI | none — W&B web dashboard only | mlflow_widgets.MlflowChart cell, gated off in script mode |
| Grid launcher | HF Jobs + WANDB_API_KEY secret |
HF Jobs + MLFLOW_TRACKING_URI (+ optional MLFLOW_TRACKING_TOKEN) |
Pick this skill when the user has a self-hosted MLflow server (or local ./mlruns is fine) and prefers strongly-typed CLIs. Pick upstream when the user is already on W&B.
Dual-mode pattern
The single source-of-truth idiom: branch on mo.app_meta().mode == "script" once, build params either from a form or from CLI flags, then let every downstream cell consume params regardless of source.
import marimo as mo
is_script_mode = mo.app_meta().mode == "script"
if is_script_mode:
params = tyro.cli(ModelParams)
else:
mo.stop(form.value is None, mo.md("*Submit the form to start training.*"))
params = ModelParams(**form.value)
# Every cell below uses `params.epochs`, `params.batch_size`, ...
# unaware of which branch produced it.
This is what makes the notebook usable as both a UI for fast iteration and a CLI script for sweeps without code duplication.
Params with dataclass + Tyro (primary)
from dataclasses import dataclass
import tyro
@dataclass
class ModelParams:
"""Model training parameters."""
epochs: int = 25
"""Number of training epochs."""
batch_size: int = 32
"""Training batch size."""
learning_rate: float = 1e-4
"""Learning rate for AdamW."""
mlflow_experiment: str = "batch-sizes"
"""MLflow experiment name (empty string disables logging)."""
mlflow_run_name: str | None = None
"""Optional explicit run name; auto-derived from params if None."""
if is_script_mode:
params = tyro.cli(ModelParams)
Tyro derives --epochs INT, --batch-size INT, etc., from the field names; PEP 257 docstrings under each field become CLI help text. --help is generated automatically — no rich.Table boilerplate needed.
CLI usage:
uv run notebook.py --epochs 50 --batch-size 64 --learning-rate 5e-4
uv run notebook.py --help # auto-generated
Params with Pydantic (alternative)
Tyro v0.8+ supports pydantic.BaseModel directly. Trade dataclass for Pydantic when you need field-level validators or @computed_field:
from pydantic import BaseModel, Field
import tyro
class ModelParams(BaseModel):
epochs: int = Field(default=25, description="Number of training epochs.")
batch_size: int = Field(default=32, description="Training batch size.")
learning_rate: float = Field(default=1e-4, description="Learning rate.")
mlflow_experiment: str = Field(default="batch-sizes")
params = tyro.cli(ModelParams) # same call site
Field(description=...) becomes CLI help. See references/params-pydantic.py for the cell-level diff against the dataclass version.
MLflow tracking
Wrap the training loop in mlflow.start_run(). Default to graceful degradation when MLFLOW_TRACKING_URI is unset (MLflow falls back to ./mlruns/). Disable logging entirely by setting params.mlflow_experiment = "".
import mlflow
import os
tracking_uri = os.environ.get("MLFLOW_TRACKING_URI")
if tracking_uri:
mlflow.set_tracking_uri(tracking_uri)
if params.mlflow_experiment:
mlflow.set_experiment(params.mlflow_experiment)
run_ctx = mlflow.start_run(run_name=params.mlflow_run_name) if params.mlflow_experiment else None
if run_ctx:
mlflow.log_params({k: v for k, v in vars(params).items() if not k.startswith("mlflow_")})
for epoch in range(params.epochs):
avg_loss = train_one_epoch(...)
if run_ctx:
mlflow.log_metric("loss", avg_loss, step=epoch)
if run_ctx:
mlflow.end_run()
Use with mlflow.start_run(...) as run: if you don't need conditional disable.
Live training widget (UI mode only)
In edit mode, embed mlflow_widgets.MlflowChart so the user sees the loss curve update live as the training cell runs. Gate it off in script mode (no display surface):
from mlflow_widgets import MlflowChart
if not is_script_mode and params.mlflow_experiment:
chart = MlflowChart(
tracking_uri=os.environ.get("MLFLOW_TRACKING_URI", "file:./mlruns"),
experiment_name=params.mlflow_experiment,
metric_key="loss",
)
widget = mo.ui.anywidget(chart)
widget
For comparing finished runs, swap MlflowChart → MlflowRunTable or MlflowParallelCoordinates. See mlflow-widgets README for the full surface.
Environment Variables (EnvConfig)
Keep upstream's wigglystuff.EnvConfig pattern but swap secrets:
from wigglystuff import EnvConfig
import mlflow
env_config = mo.ui.anywidget(
EnvConfig({
"MLFLOW_TRACKING_URI": lambda u: mlflow.MlflowClient(tracking_uri=u).search_experiments(),
"MLFLOW_TRACKING_TOKEN": lambda _: True, # presence-only check
})
)
env_config if not is_script_mode else None
Place the EnvConfig cell near the top of the notebook, after imports, before the params form.
Columns
Preserve upstream's column convention for navigability:
@app.cell(column=0, hide_code=True)
def _(mo):
mo.md(r"""## Notebook Description""")
return
Recommended layout: column=0 description + envs, column=1 params form, column=2 data setup, column=3 model, column=4 training loop + live chart.
Grid search
For hyperparameter sweeps point users at references/grid.py. Same contract as upstream:
- Dry run by default:
uv run grid.pyprints sampled combinations. --launchactually submits jobs.--count Nand--seed Scontrol sampling.- Backend: Hugging Face Jobs via
huggingface_hub.run_uv_job. Required secrets:HF_TOKENandMLFLOW_TRACKING_URI(instead of upstream'sWANDB_API_KEY).
Swap HF Jobs for Modal / RunPod / local subprocess.run([...]) by replacing the run_uv_job call — the rest of the script (search space, dedup, dry-run formatting) is provider-agnostic.
Workflow when the user invokes this skill
- Confirm the user wants the MLflow + Tyro variant (not upstream W&B +
mo.cli_args()). - Ask which params they want exposed as CLI flags.
- Ask: dataclass or Pydantic? Default to dataclass unless they need validators.
- Ask: live
MlflowChartcell yes/no? Default yes. - Verify proposed cell-level edits with the user before applying. Keep
@app.cell(column=N, hide_code=True)markers intact. - If they want a sweep, copy
references/grid.pynext to their notebook and update theSEARCH_SPACEdict.
Cross-references
marimo-notebook— general marimo authoring patterns (vendored from marimo-team).anywidget-generator— for building the live-chart widget ifmlflow-widgetsdoesn't cover the case (vendored).- Upstream
marimo-batch— the W&B variant. mlflow-widgets— anywidget components for MLflow (MlflowChart,MlflowRunTable,MlflowParallelCoordinates).- Tyro docs — CLI generation reference.
More from daviddwlee84/agent-skills
project-knowledge-harness
Set up a structured project memory for any software project — TODO.md as priority/effort-tagged index of future work, backlog/ for resume-friendly research/design notes on P? items, and pitfalls/ as a symptom-grep-able knowledge base of past traps. Use when a user wants somewhere to record "maybe later" ideas, freeze troubleshooting state, capture trade-off analysis, or stop re-debugging the same problem.
15agent-history-hygiene
Commit SpecStory chat transcripts (`.specstory/history/*.md`), Claude Code plan files (`.claude/plans/*.md`, `plansDirectory`), and other coding-agent artifacts (`.cursor/plans/`, `.cursor/rules/`, `.opencode/plans/`, `.specify/`, `.codex/`) alongside the feature diff they produced — without leaking `.env` contents, API keys, or private-key PEM blocks into git history. Use when the user says "commit my chat", "save this specstory session", "stage the plan file", "scrub the transcript", "my .env leaked in chat", "bootstrap pre-commit for this project", or when you notice untracked `.specstory/history/*.md` or `.claude/plans/*.md` files while running `git status`. Also use after an accidental push of a secret to enforce rotate-first, rewrite-last remediation instead of reflexive `git push --force`.
11mkdocs-site-bootstrap
Bootstrap MkDocs Material docs sites with optional GitHub Pages deploy, uv-pinned tooling, llms.txt/copy-to-LLM support, page/nav helpers, and mkdocs-static-i18n languages such as zh-TW. Use when the user asks to set up docs, publish docs to GitHub Pages, create an MkDocs site, turn README or markdown notes into a site, add bilingual/multilingual docs, add zh-TW/Traditional Chinese, i18n, or translate docs. Consent-gated; records repo preferences and never auto-migrates existing docs.
11pueue-job-queue
Drive Nukesor/pueue (https://github.com/Nukesor/pueue) for queued, parallel, scheduled, and lightly-DAG'd shell jobs — wraps `pueue add --after`, `pueue status --json`, `pueue log --json`, group-level parallelism, and `pueued` daemon health. Use when the user wants to background long-running shell commands across reboots, queue dozens of jobs with capped parallelism, run a fan-out / fan-in pipeline of shell steps, says "pueue", "pueued", "pueue add", "pueue queue", "pueue group", "task queue for shell", "background this job", or asks how to schedule/parallelize CLI work without a real orchestrator (Airflow/Prefect/Dagster). Good fit for ML sweeps, long-running data pipelines, batched evaluations, scheduled `--delay` jobs, "wait for X then run Y" sequences.
4marimo-notebook
Write a marimo notebook in a Python file in the right format.
3skill-author
Author a new agent skill or refactor an existing one to follow agentskills.io best practices — gotchas sections, output templates, validation loops, calibrated specificity (fragility-based), and agentic script design (--help, --dry-run, structured stdout, stderr diagnostics, PEP 723 inline deps, pinned uvx/npx versions). Use whenever the user wants to create a new skill from scratch, scaffold a SKILL.md, write a reference file, design a script meant to be invoked by an agent, lint a draft skill for quality, or convert an ad-hoc workflow into a reusable skill. For evaluating skill output quality with test cases, benchmarking, or optimizing the description trigger rate, defer to the `skill-creator` skill instead — this skill focuses on authoring, not evaluation.
2