skills/pproenca/dot-skills/cli-review-runner

cli-review-runner

Installation
SKILL.md

cli-review-runner

Automates the 10-item agent-friendliness audit from cli-for-agents. Runs black-box probes against a target CLI and emits a structured report mapping each finding to a rule ID (e.g., help-examples-in-help, err-non-zero-exit-codes, safe-dry-run-flag). Default mode is read-only - probes never run destructive verbs with real arguments.

When to Apply

  • User asks to review or audit a CLI for agent-friendliness, automation readiness, or CI use
  • User has just finished building a CLI and wants a pre-ship sanity check
  • User is grading their own or a third-party CLI against the cli-for-agents catalog
  • User is asking why a CLI is hanging an agent, blowing up context, or failing to compose in a pipeline
  • PR review for a CLI change - quickly regress-test the --help, errors, and dry-run flags

How to Use

The skill is orchestrated by scripts/review.sh. Point it at the target CLI (absolute path or PATH-resolvable name) and pick an output format.

# Default: text table on stdout, exit 0 if all passed, 1 if any failed
bash scripts/review.sh --target /usr/local/bin/mycli

# Machine-readable output
bash scripts/review.sh --target gh --format json
bash scripts/review.sh --target kubectl --format ndjson

# Supply subcommand list when auto-discovery misses them
bash scripts/review.sh --target gh --subcommands pr,issue,repo

# Preview what would run without touching the target CLI
bash scripts/review.sh --target mycli --dry-run

# Include risky probes on destructive verbs (off by default)
bash scripts/review.sh --target mycli --include-destructive

See bash scripts/review.sh --help for the full flag list.

Workflow Overview

--target <cli>
[1] Validate target       fail fast if path missing or not executable
[2] Load rule catalog     references/rule-catalog.tsv (45 rules)
[3] Discover subcommands  parse top-level --help (gh/kubectl/commander shapes)
[4] Run probes P1..P10    each probe emits NDJSON findings to a temp file
[5] Render report         scripts/render.sh  ->  text | json | ndjson

Read references/workflow.md when you need the full probe-by-probe breakdown, failure modes, and how to extend the catalog.

Probe Coverage

Probe Rules tested Coverage
P1 Non-interactive interact-no-hang-on-stdin, interact-no-input-flag, interact-flags-first, interact-detect-tty, interact-no-timed-prompts, interact-no-arrow-menus, input-no-prompt-fallback Run under </dev/null with timeout; inspect help for interactive language
P2 Layered help help-per-subcommand, help-no-flag-required, help-layered-discovery Top-level line count; per-subcommand --help; zero-arg invocation
P3 Help examples help-examples-in-help, help-flag-summary, help-suggest-next-steps Grep each subcommand help for Examples:, short+long flag pairs, "See also"
P4 Actionable errors err-actionable-fix, err-include-example-invocation, err-exit-fast-on-missing-required, err-no-stack-traces-by-default Invoke with bogus flag; grep stderr for fix + example; check for raw stack traces
P5 stderr channeling err-stderr-not-stdout Error text must land on fd 2
P6 Exit codes err-non-zero-exit-codes Usage error and runtime error must produce distinct non-zero codes
P7 stdin composition input-accept-stdin-dash, input-flags-over-positional Grep help for - stdin convention; count positionals vs flags
P8 Structured output output-json-flag, output-respect-no-color --json produces JSON; NO_COLOR=1 suppresses ANSI
P9 Destructive safety safe-dry-run-flag, safe-force-bypass-flag, safe-no-prompts-with-no-input Inspect destructive verbs' --help for --dry-run / --yes / --force / --no-input
P10 Command structure struct-resource-verb, struct-standard-flag-names, struct-no-hidden-subcommand-catchall, struct-flag-order-independent Uniform shape, --help/--version present, unknown subcommand errors, flag position independence

Coverage: 30 of the 45 rules in cli-for-agents are black-box testable. The remaining 15 (idempotency, state reconciliation, NDJSON streaming, bounded output, crash-only recovery, env-var fallback, secret-stdin, confirm-by-typing-name) require either real invocation or source-code inspection - the report lists them as "manual review required".

Configuration

config.json stores the verb classifier lists and default timeout. The skill works without any setup - defaults are reasonable. Override per-invocation via flags or edit the file for project-wide changes.

{
  "timeout_seconds": 5,
  "safe_verbs": ["list", "get", "show", "status", "describe", "help", "version", "config", "ls", "inspect"],
  "destructive_verbs": ["delete", "drop", "destroy", "remove", "reset", "purge", "rm", "del"]
}

Safety Model

All probes are read-only by default:

  • Safe verbs (list, get, show, ...) may be invoked with bogus flags to test error handling
  • Destructive verbs (delete, drop, ...) are ONLY inspected via --help - never executed with arguments
  • Every probe runs with a 5-second wall-clock timeout under </dev/null
  • The target CLI is sandboxed to its own process; no shell metacharacters in arguments

When --include-destructive is passed, probes may invoke destructive verbs with bogus flags too. This exposes the rare case where a CLI does something destructive before validating flags. Only enable this against CLIs you trust, or in a disposable test environment.

Self-test

Before shipping changes to probes, run the self-test - it generates a mock CLI that deliberately violates specific rules and asserts the probes detect them:

bash scripts/selftest.sh

Expected output: Results: 8 passed, 0 failed. Any failure points at a regression in scripts/lib/probes.sh.

Files

File Purpose
scripts/review.sh Main entry point - orchestrates probes, renders the report
scripts/render.sh Output formatter: text / json / ndjson
scripts/selftest.sh Sanity check against a deliberately-buggy mock CLI
scripts/lib/common.sh Shared helpers: timeout, verb classifier, JSON escape, catalog loader
scripts/lib/probes.sh Probe functions probe_p1..probe_p10
references/rule-catalog.tsv 45 rules from cli-for-agents, mapped to probes
references/workflow.md Detailed probe-by-probe methodology, failure modes, extension guide
gotchas.md Known edge cases discovered during use
config.json Verb classifier lists and default timeout

Related Skills

  • cli-for-agents - the 45-rule design catalog this skill audits against. Read rule files there when the report flags an issue and you need the full explanation.
Weekly Installs
59
GitHub Stars
131
First Seen
1 day ago