pueue-job-queue
pueue job queue
Pueue is a daemon-backed shell job queue. The daemon (pueued) accepts tasks
via the pueue CLI, runs them across parallelism-capped groups, persists
state across reboots, and exposes status / logs as JSON. This skill teaches
the agent to drive pueue end-to-end: submit single tasks or whole DAGs, cap
parallelism per group, block until completion, retrieve logs, and retry.
The skill is a CLI bridge, not a scheduler. Pueue's --after is AND-only
and success-only — that maps cleanly to declarative DAGs and not much
beyond. For OR-deps, conditional branching, retry-with-backoff, or distributed
scheduling, escalate to a real orchestrator (see "When NOT to use").
When to use
- "Run these 30 commands, max 4 at a time" →
pueue group add ml && pueue parallel 4 --group mlthen loopsubmit.sh --group ml. - "Kick off a long training job and let me close my laptop" →
submit.sh -- ./train.sh(pueue persists across reboots). - "Run task B only after task A finishes successfully" →
submit.sh --after $A_ID -- ./b.sh. - "Fan out 4 trainings, then evaluate" →
submit-dag.py dag.yaml. - "Schedule this for tonight" →
submit.sh --delay 6h -- ./nightly.sh. - "Block until task 17 is done, then tell me what happened" →
wait.py --ids 17. - The user mentions pueue, pueued, the pueue CLI, or asks about a "task queue for shell jobs".
When NOT to use
- One short shell command the user wants to run right now. Just run it. Pueue adds daemon overhead for nothing.
- Cross-host scheduling (jobs that must land on specific machines) → Airflow / Dagster / Prefect / Slurm.
- Conditional / OR / retry-with-backoff dependencies → real orchestrator. Pueue's
--afteris AND-only and success-only. - Typed task IO and artifact tracking (data lineage, cached intermediate outputs) → DVC (
dvc exp run --queue), Prefect, Airflow. - Long-running services (web servers, daemons) → systemd, launchd, supervisord. Pueue is for finite tasks.
Authoritative sources
- Repo: https://github.com/Nukesor/pueue
- Wiki: https://github.com/Nukesor/pueue/wiki
pueue --help,pueue <subcommand> --help,man pueue
Mental model
| Concept | What it is | How to interact |
|---|---|---|
Daemon (pueued) |
Long-running process holding queue state. Required. | pueued -d to start. pueue shutdown to stop. |
| Task | One shell command, identified by integer id. | pueue add ..., pueue status, pueue kill, pueue remove. |
| Group | Named queue with its own parallelism limit. default always exists. |
pueue group add <name>, pueue parallel N --group <name>. |
| Dependency | Task waits for one or more parents to succeed before running. | pueue add --after <id> --after <id> -- <cmd>. AND-only, success-only. |
| Status | Tagged enum: Queued, Running, Stashed, Paused, Locked, Done. Done carries a result (Success, Killed, DependencyFailed, or {"Failed": <exit_code>}). |
pueue status --json. |
Setup quickstart
# 1. Verify daemon is healthy. Auto-start if missing.
bash skills/local/pueue-job-queue/scripts/check-daemon.sh --start --json | jq
# 2. (Optional) Create a group with 4 parallel slots.
pueue group add ml
pueue parallel 4 --group ml
For per-OS daemon paths and launchd/systemd setup, read
references/daemon-and-config.md.
Always pass a label
pueue status shows the label column before the command. In a busy
queue, scanning labels is the only way to tell train.py runs apart at a
glance. Always pass --label when submitting, and bias toward names
that distinguish this task from its siblings.
Convention:
- Shape:
<verb>-<subject>-<key>(≤30 chars).verb= action,subject= artifact,key= the variable that distinguishes this run. - Encode the discriminator: seed, dataset slice, model variant, date, host.
- Don't restate the command —
train.py --seed 1becomes labeltrain-baseline-seed1, notpython-train-py-seed-1. - For DAGs,
submit-dag.py --label-prefix <run>-makes every task labeled<run>-<task_name>sowait.py --label-prefix <run>-selects the whole graph andpueue cleancan be filtered later.
| Good | Bad |
|---|---|
train-baseline-seed1 |
task1 (no metadata) |
eval-prod-2026q1 |
python eval.py --quarter 2026q1 (restates cmd) |
fetch-prices-AAPL |
fetch-data (multiple identical labels) |
nightly-featurize (DAG) |
step-2-of-5 (rename a step → label rots) |
Workflow A — submit one task
bash skills/local/pueue-job-queue/scripts/submit.sh \
--label train-baseline --group ml \
-- python train.py --seed 1
# stdout: {"task_id":17,"label":"train-baseline","group":"ml","after":[],...}
# stderr: submitted task 17 (group=ml, label=train-baseline)
Capture the id with jq -r .task_id. The script wraps pueue add --print-task-id and falls back to a status-by-label lookup if the id parse
fails (defensive against future format changes).
Workflow B — batched runs with capped parallelism
# One-time: create + size the group
pueue group add sweep
pueue parallel 4 --group sweep
# Loop: submit 30 tasks; pueue runs at most 4 at a time
for SEED in $(seq 1 30); do
bash skills/local/pueue-job-queue/scripts/submit.sh \
--label "sweep-$SEED" --group sweep \
-- python train.py --seed "$SEED"
done
# Block until all sweep-* tasks are done
skills/local/pueue-job-queue/scripts/wait.py \
--label-prefix sweep- --group sweep
Group parallelism is the parallelism primitive in pueue — there is no per-task pool. Set it once, submit freely.
Workflow C — fan-out / fan-in DAG
For multi-task pipelines with success-only dependencies, write a YAML spec
and submit it declaratively. Prefer --isolated-group — it creates a
fresh group sized to the DAG's fan-out width and leaves your other groups
(especially default) untouched:
skills/local/pueue-job-queue/scripts/submit-dag.py \
skills/local/pueue-job-queue/assets/dag.example.yaml \
--label-prefix nightly- --isolated-group
# stderr: isolated-group: created 'dag-dcda9347' with parallel_tasks=2
# stdout: {"tasks":{"fetch":17,...},
# "topo_order":[...],
# "width_per_group":{"dag-dcda9347":2},
# "isolated_group":"dag-dcda9347"}
Each task gets the label <--label-prefix><task_name> so you can
wait.py --label-prefix nightly- later. After the run, clean up with
pueue group remove <name> (or run cleanup.sh --remove-empty-groups).
The submitter topo-sorts, validates the graph (cycle detection, unknown
references, missing cmd:), computes the DAG width per group (the
minimum parallel_tasks needed for fan-out to actually run in parallel),
then submits in order — wiring --after from the in-flight name→id map.
No partial submits: validation fails loud before any pueue add runs.
Three parallelism modes, in order of recommendation:
--isolated-group [NAME](recommended). Fresh group, sized to width, no side effects on other groups. NAME auto-generates from spec hash if omitted. Mutually exclusive with--default-group.--auto-parallel. Mutatespueue parallelon the target group in place. Use this when the DAG is meant to live in a long-lived shared group (e.g.ml) and that group's parallelism should permanently grow.- No flag. Only warns (in stderr + JSON's
parallelism_warnings). Useful in--dry-runplanning.
See assets/dag.example.yaml for a 5-task fan-out/fan-in template and
references/dag-patterns.md for more shapes.
Workflow D — block until tasks reach terminal state
# Wait for specific ids
skills/local/pueue-job-queue/scripts/wait.py --ids 17,18,19 --timeout-seconds 300
# Wait for everything in a group
skills/local/pueue-job-queue/scripts/wait.py --group ml
# Wait for a label prefix (e.g. all tasks the DAG submitter created)
skills/local/pueue-job-queue/scripts/wait.py --label-prefix nightly-
Output is a JSON summary on stdout:
{"summary": {"total": 4, "success": 3, "failed": 1, "killed": 0, "dependency_failed": 0},
"tasks": [{"id": 17, "status": "Done", "result": "Success", "exit_code": 0,
"label": "nightly-fetch", "group": "ml",
"start": "...", "end": "..."}],
"elapsed_seconds": 42.3}
Exit codes: 0 all succeeded, 5 ≥1 task failed/killed/dependency-failed,
6 timeout. The agent can branch on exit code instead of parsing JSON.
Workflow E — logs, retry, kill
One-line pueue calls (full list in references/cli-cheatsheet.md):
pueue log --json --full 17 # full stdout/stderr as JSON
pueue restart --in-place 17 # retry in-place (overwrites old log)
pueue kill 17 && pueue remove 17 # kill + drop from queue
For ad-hoc filtering, prefer pueue's built-in QUERY DSL over piping
through jq — it's faster (server-side filter) and shorter:
pueue status --json 'status=Failed order_by end desc first 10'
pueue status --json 'label %= sweep-' # substring match
See references/json-schema.md for the QUERY grammar + jq fallbacks.
Workflow F — periodic cleanup
Pueue's task list grows unbounded. pueue status --json walks the whole
table, so a queue with thousands of completed tasks slows down everything
that touches status. Run cleanup.sh weekly:
skills/local/pueue-job-queue/scripts/cleanup.sh \
--successful-only --remove-empty-groups --logs-older-than 30
# stdout: {"cleaned_tasks":[...], "removed_groups":["dag-abc12345"],
# "deleted_logs":[...], "kept_running":[...], ...}
--successful-only keeps failures around for inspection; drop the flag
for a full sweep. Pair with the agent's existing weekly maintenance habit
(or /schedule an agent to run it).
Available scripts
scripts/check-daemon.sh— Single-shot daemon health + version + log-dir check, with optional auto-start. Outputs JSON to stdout.- Flags:
--start,--json(default on),--help. - Exit:
0healthy,2pueue not installed,3daemon unreachable,4client/daemon version mismatch.
- Flags:
scripts/submit.sh— Submit ONE task, return clean JSON{task_id, group, label, after, ...}. Wrapspueue add --print-task-idwith defensive id parsing. Auto-creates the group if missing.- Flags:
--label,--group,--after(repeatable),--immediate,--stashed,--delay,--priority,--working-dir,--escape,--dry-run,--help. - Exit:
0ok,1bad args,2pueue not installed,3pueue addfailed,4daemon unreachable.
- Flags:
scripts/wait.py— Block until selected tasks reach terminal status; emit a JSON summary. Selectors:--ids,--label,--label-prefix,--group. Quiet by default — emits only one initial line + state-change events on stderr; pass--verbosefor per-tick output.- Flags:
--ids,--label(repeatable),--label-prefix,--group,--poll-seconds,--timeout-seconds,--fail-fast,--verbose,--help. - Exit:
0all succeeded,1arg error,4daemon unreachable,5≥1 failed/killed/dependency-failed,6timeout.
- Flags:
scripts/submit-dag.py— Declarative DAG submitter. Reads YAML or JSON (file path or-for stdin), topo-sorts, validates, submits with wired--after, returns{name → id, topo_order, width_per_group, isolated_group?}. Computes DAG fan-out width per group; with--isolated-group [NAME]creates a fresh dedicated group sized to the width (preferred — leaves user's other groups untouched). With--auto-parallelmutates target groups in place. Without either, only warns.- Flags:
--format,--default-group,--label-prefix,--dry-run,--print-graph,--auto-parallel,--isolated-group [NAME],--help. - Exit:
0all submitted,1schema/cycle/unknown-ref/conflicting flags,2pueue not installed,3mid-run pueue failure (stdout still emits IDs that DID submit, for cleanup).
- Flags:
scripts/cleanup.sh— Prune pueue task history, empty groups, and old log files. Wrapspueue clean+pueue group removefor empty non-default groups + optional log-file mtime pruning. Emits a JSON report. Run weekly to keeppueue status --jsonfast.- Flags:
--successful-only,--group GROUP,--remove-empty-groups,--logs-older-than N,--dry-run,--help. - Exit:
0ok (or dry-run),1bad args,2pueue not installed,3daemon unreachable.
- Flags:
Bundled assets
assets/dag.example.yaml— 5-task fan-out/fan-in (fetch → featurize → {train_a, train_b} → evaluate). Pass it tosubmit-dag.pywith--dry-runfirst to see the topo order.assets/pueue.yml.example— Config snippet showingpause_group_on_failure: true,default_parallel_tasks, and a samplegroups:block. The header comment names the platform-specific destination paths.
Reference files
references/cli-cheatsheet.md— Read when the task needspueue follow,pueue log,pueue restart,pueue kill,pueue clean,pueue group/parallel,pueue pause/start,pueue reset,pueue edit,pueue env, or any other un-wrappedpueuesubcommand. Has a one-line "when to use" for each.references/json-schema.md— Read when writing customjqqueries againstpueue status --jsonorpueue log --json, or when a script's JSON parsing surprises you. Documents the observed schema on pueue 4.0.2 with concrete examples for each status variant.references/dag-patterns.md— Read when the user asks for shapes beyond fan-out/fan-in (mixed sequential+parallel, diamond, etc.) or hits the AND-only / success-only limitation. Has examples and a "when to escalate to a real orchestrator" decision table.references/daemon-and-config.md— Read when setting uppueuedfor the first time, configuring per-OS paths, picking config knobs (pause_group_on_failure,default_parallel_tasks), or wiring up launchd / systemd-user.
Gotchas
- DAG fan-out is gated by
parallel_tasksper group. Even when two siblings both depend only onAand the DAG allows them to run in parallel, they will serialize if their group'sparallel_tasks=1. Pueue's parallelism primitive is the group, not the dependency graph.submit-dag.pywarns when DAG width exceeds the group's slots and suggests the exactpueue parallel N --group Gcommand; pass--auto-parallelto apply it. pueue add -- bash -c 'sleep 60'does not preserve quoting. Pueue joins all<COMMAND>args and re-shells. The single-quoted'sleep 60'is unwrapped by your shell before pueue sees it, then pueue re-splits. Quote the whole command as one arg:pueue add 'sleep 60'orsubmit.sh -- 'bash -c "sleep 60 && echo done"'.--afteris AND-only and success-only. Failed parent → dependent'sstatus.Done.resultbecomes"DependencyFailed"and it never runs. There is no OR, no run-on-failure, no retry-on-failure. If you need that, you need a real orchestrator.- Pueue does NOT auto-create groups.
pueue add --group new_nameexits 1 with"Group new_name doesn't exists. Use one of these: [...]".submit.shcallspueue group addfirst if the group is missing. pueue restart(default) creates a NEW task id. The old id stays in history with its result. To retry in-place (reusing the id, overwriting the log), pass--in-place. Choose deliberately — agent users often want--in-placefor "retry this".pause_group_on_failureis config-only, no CLI flag. Set it inpueue.yml; reload withpueue resetor restartpueued.- Logs live in platform-specific dirs, not your CWD. macOS:
~/Library/Application Support/pueue/logs/. Linux:~/.local/share/pueue/logs/. Windows:%APPDATA%\pueue\logs\.pueue log --json --full <id>reads them for you. pueue waitreturns 0 even if the waited-on tasks failed. It blocks until terminal, period. To know success/failure, querypueue status --jsonafterward (or usewait.py, which does this for you and returns a non-zero exit on failures).pueue removerequires each id as a separate positional arg (not a space-joined string from a subshell). Use bash arrays:IDS=($(...)) && pueue remove "${IDS[@]}".pueue clean --successful-onlyis a flag, not the default. Plainpueue cleanremoves ALL finished tasks (including failures, which you may want to keep for debugging). Be deliberate.pueue add --print-task-idwrites the bare integer to stdout. No JSON, no prose.submit.shparses with a regex that tolerates future format changes; if you bypasssubmit.sh, capture withID=$(pueue add --print-task-id ... )directly.
More from daviddwlee84/agent-skills
project-knowledge-harness
Set up a structured project memory for any software project — TODO.md as priority/effort-tagged index of future work, backlog/ for resume-friendly research/design notes on P? items, and pitfalls/ as a symptom-grep-able knowledge base of past traps. Use when a user wants somewhere to record "maybe later" ideas, freeze troubleshooting state, capture trade-off analysis, or stop re-debugging the same problem.
15agent-history-hygiene
Commit SpecStory chat transcripts (`.specstory/history/*.md`), Claude Code plan files (`.claude/plans/*.md`, `plansDirectory`), and other coding-agent artifacts (`.cursor/plans/`, `.cursor/rules/`, `.opencode/plans/`, `.specify/`, `.codex/`) alongside the feature diff they produced — without leaking `.env` contents, API keys, or private-key PEM blocks into git history. Use when the user says "commit my chat", "save this specstory session", "stage the plan file", "scrub the transcript", "my .env leaked in chat", "bootstrap pre-commit for this project", or when you notice untracked `.specstory/history/*.md` or `.claude/plans/*.md` files while running `git status`. Also use after an accidental push of a secret to enforce rotate-first, rewrite-last remediation instead of reflexive `git push --force`.
11mkdocs-site-bootstrap
Bootstrap MkDocs Material docs sites with optional GitHub Pages deploy, uv-pinned tooling, llms.txt/copy-to-LLM support, page/nav helpers, and mkdocs-static-i18n languages such as zh-TW. Use when the user asks to set up docs, publish docs to GitHub Pages, create an MkDocs site, turn README or markdown notes into a site, add bilingual/multilingual docs, add zh-TW/Traditional Chinese, i18n, or translate docs. Consent-gated; records repo preferences and never auto-migrates existing docs.
11skill-author
Author a new agent skill or refactor an existing one to follow agentskills.io best practices — gotchas sections, output templates, validation loops, calibrated specificity (fragility-based), and agentic script design (--help, --dry-run, structured stdout, stderr diagnostics, PEP 723 inline deps, pinned uvx/npx versions). Use whenever the user wants to create a new skill from scratch, scaffold a SKILL.md, write a reference file, design a script meant to be invoked by an agent, lint a draft skill for quality, or convert an ad-hoc workflow into a reusable skill. For evaluating skill output quality with test cases, benchmarking, or optimizing the description trigger rate, defer to the `skill-creator` skill instead — this skill focuses on authoring, not evaluation.
2dvc-ml-workflow
Set up and operate a DVC (Data Version Control) workflow for ML projects — `dvc init`, `dvc.yaml` pipelines, `params.yaml`, `dvc exp run --queue` for parallel sweeps with metrics auto-bound to ephemeral git commits, and remote storage (S3/SSH/GDrive). Use whenever the user wants reproducible ML pipelines, data/model versioning that lives alongside git, parameter sweeps without standing up a tracking server, queued/parallel experiment execution, or asks about `dvc.yaml` / `dvc exp run` / `dvc queue` / `params.yaml` / `dvc add` / `dvc push` / `.dvc/cache`. Always references the official docs at https://dvc.org/doc and the upstream repo https://github.com/treeverse/dvc (Iterative was acquired by Treeverse in 2024 — `pip install dvc` resolves to this repo).
2skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
2