update-dataset
Update Dataset (PR → snapshot → steps → grapher)
Use this skill to run a complete dataset update with Claude Code subagents, keep a live progress checklist, and pause for approval at a checkpoint after every numbered workflow step before continuing.
Inputs
<namespace>/<old_version>/<name>- Get
<new_version>as today's date by runningdate -u +"%Y-%m-%d"
Optional trailing args:
- branch: The working branch name (defaults to current branch)
Assumptions:
- All artifacts are written to
workbench/<short_name>/. - Persist progress to
workbench/<short_name>/progress.mdand update it after each step.
Progress checklist (maintain, tick live, and persist to progress.md)
(Checkpoint rule: After you finish each item below that represents a workflow step, immediately run the CHECKPOINT procedure. Do not batch multiple steps before a checkpoint.)
- Parse inputs and resolve: channel, namespace, version, short_name, old_version, branch
- Clean workbench directory: delete
workbench/<short_name>unless continuing existing update - Run ETL update workflow via
etl-updatesubagent (help → dry run → approval → real run) - Create or reuse draft PR and work branch
- Update snapshot and compare to previous version; capture summary
- Meadow step: run + fix + diff + summarize
- Garden step: run + fix + diff + summarize
- Grapher step: run + verify (skip diffs), or explicitly mark N/A
- CHECKPOINT — present consolidated summary and request approval
- If approved, commit, push, and update PR description
- Optional: run indicator upgrade on staging and persist report
- Draft Slack announcement and notify user to post it to #data-updates-comms
Persistence:
- After ticking each item, update
workbench/<short_name>/progress.mdwith the current checklist state and a timestamp.
CHECKPOINT (mandatory user approval)
Always performed immediately after completing each numbered workflow step (1–6). Never start the next step until approval is granted.
Procedure (each time):
- Present a concise summary of what just changed, key diffs/issues resolved, and what the next step will do.
- Ask exactly: Proceed? reply: yes/no
- Only continue if the user replies exactly yes (case-insensitive). Any other reply = no; stop and wait.
- On approval:
- Update progress checklist (tick the completed item) and write
workbench/<short_name>/progress.mdwith timestamp. - Commit related changes (if any), push.
- Update (or append to) the PR description: add a collapsed section titled with the step name (e.g., "Snapshot Update", "Meadow Update") containing the summary.
- Update progress checklist (tick the completed item) and write
Mandatory per-step checkpoints (rule)
You MUST:
- Stop after each workflow step (1–6) and run CHECKPOINT before starting the next (step 7 is optional and still requires a checkpoint if executed).
- Never chain multiple steps inside a single approval.
- Treat missing or ambiguous replies as no.
Workflow orchestration
-
Initial setup
- Check if
workbench/<short_name>/progress.mdexists to determine if continuing existing update - If starting fresh: delete
workbench/<short_name>directory if it exists - Create fresh
workbench/<short_name>directory for artifacts
- Check if
-
Run ETL update command (etl-update subagent)
- Inputs:
<namespace>/<old_version>/<short_name>plus any required flags - CRITICAL: Run
etl updateONCE for the full step URI (e.g.,data://garden/namespace/old_version/short_name). Do NOT run it separately per channel (snapshot, meadow, garden, grapher). Running it once ensures all cross-step DAG dependencies are updated together. Running it per-channel leaves stale version references indag/main.yml(e.g., garden pointing to old meadow version). - Perform help check, dry run, approval, then real execution; capture summary for later PR notes
- After running, always verify
dag/main.yml: grep for the old version and confirm all internal references between the new steps point to the new version (e.g., garden depends on new meadow, not old meadow). - CHECKPOINT (stop → summarize → ask → require yes)
- Inputs:
-
Create PR and integrate update via subagent (etl-pr)
- Inputs:
<namespace>/<old_version>/<short_name> - Create or reuse draft PR, set up work branch, and incorporate the ETL update outputs
- CHECKPOINT
- Inputs:
-
Snapshot run & compare (snapshot-runner subagent)
- Inputs:
<namespace>/<new_version>/<short_name>and<old_version> - CHECKPOINT
- Inputs:
-
Meadow step repair/verify (step-fixer subagent, channel=meadow)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
- CHECKPOINT
-
Garden step repair/verify (step-fixer subagent, channel=garden)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
- CHECKPOINT
-
Grapher step run/verify (step-fixer subagent, channel=grapher, add --grapher)
- Skip diff
- CHECKPOINT
-
Indicator upgrade (optional, staging only)
- Use indicator-upgrader subagent with
<short_name> <branch> - CRITICAL: After the upgrader finishes, always verify it actually worked by querying staging:
make query SQL="SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'". If the count is 0, the upgrade did not run — re-run it. - CHECKPOINT (if executed)
- Use indicator-upgrader subagent with
-
Slack announcement
- Fill out the template at
.claude/skills/update-dataset/slack-announcement-template.mdusing facts gathered during the update (coverage, chart count, key changes, etc.) - Ask user if unsure about any details
- Save the draft to
workbench/<short_name>/slack-announcement.md - Tell the user: "Slack announcement drafted at
workbench/<short_name>/slack-announcement.md. Please review and post it to #data-updates-comms."
- Fill out the template at
Guardrails and tips
- DAG consistency: After
etl update, always verify that all new steps indag/main.ymlreference each other with the new version. A common bug is garden depending on old meadow or old snapshot — this silently loads stale data. - Never return empty tables or comment out logic as a workaround — fix the parsing/transformations instead.
- Column name changes: update garden processing code and metadata YAMLs (garden/grapher) to match schema changes.
- Indexing: avoid leaking index columns from
reset_index(); format tables withtb.format(["country", "year"])as appropriate. - Metadata validation errors are guidance — update YAML to add/remove variables as indicated.
Artifacts (expected)
workbench/<short_name>/snapshot-runner.mdworkbench/<short_name>/progress.mdworkbench/<short_name>/meadow_diff_raw.txtandmeadow_diff.mdworkbench/<short_name>/garden_diff_raw.txtandgarden_diff.mdworkbench/<short_name>/indicator_upgrade.json(if indicator-upgrader was used)
Example usage
- Minimal catalog URI with explicit old version:
update-dataset data://snapshot/irena/2024-11-15/renewable_power_generation_costs 2023-11-15 update-irena-costs
Common issues when data structure changes
- SILENT FAILURES WARNING: Never return empty tables or comment code as workarounds!
- Column name changes: If columns are renamed/split (e.g., single cost → local currency + PPP), update:
- Python code references in the garden step
- Garden metadata YAML (e.g.,
food_prices_for_nutrition.meta.yml) - Grapher metadata YAML (if it exists)
- Index issues: Check for unwanted
indexcolumns fromreset_index()— ensure proper indexing withtb.format(["country", "year"]). - Metadata validation: Use error messages as a guide — they show exactly which variables to add/remove from YAML files.