CLI Agent Experience

Language

Match user's language: Respond in the same language the user uses.

Overview

Audit a CLI tool's agent-friendliness against 11 dimensions split into base (determines grade) and bonus (improvement hints, doesn't penalize grade). Produces a scorecard with actionable suggestions.

Input: A CLI tool — source code, installed binary, or GitHub repo.

Output: Base grade + bonus hints + prioritized fix list.

Workflow

Progress:

Step 1: Identify the CLI

Input	Action
Source code in cwd	Read entry point, help output, output formatting code
Installed binary name	Run `<tool> --help`, sample commands
GitHub repo URL	Clone/fetch, read source

Step 2: Gather Evidence

<tool> --help and subcommand help (hierarchy depth)
<tool> <command> --help for 2-3 representative commands
Check for --json, --quiet, --dry-run, --yes/--force flags
Read output formatting code in source (stdout vs stderr separation)
Read error handling code (exit codes, error structure)
Run a sample command if safe (read-only) to see actual output format

Step 3: Score Each Dimension

Score each 0-3. Read references/dimensions.md for full checklists.

Base dimensions (determine grade, /21):

#	Dimension	Key question
1	Structured Output	Has `--json` for composability? Default output token-efficient?
2	Exit Codes	Meaningful graduated codes, not just 0/1?
3	Non-Interactive Mode	Can run fully unattended? TTY detection?
5	Self-Documentation	`--help` has examples? Required vs optional clear?
8	Actionable Errors	Error codes + failing input echo + recovery hints?
9	Consistent Grammar	noun-verb hierarchy? Consistent naming?
11	Token Efficiency	Output concise by default? Pagination? `--limit`/`--field`?

Bonus dimensions (hints only, /12):

#	Dimension	Key question	Why bonus
4	Idempotency	Safe to retry? Declarative commands?	Domain-dependent (browser/chat inherently side-effectful)
6	Composability	`--quiet`? stdin? Batch ops? Filtering?	Depends on tool scope
7	Dry-Run	`--dry-run` with structured diff?	Domain-dependent (API wrappers, REPL)
10	Agent Teaching	Skills/docs for agent consumption?	Forward-looking, industry immature

Domain adaptation: Bonus dimensions have domain-dependent ceilings. When a tool maxes out what's possible in its domain, mark X/3 (domain ceiling). See references/dimensions.md § Domain Adaptation.

Step 4: Present Report

[CLI Agent Experience] <tool-name>

═══ Base Score ═══
 1. Structured Output:    <score>/3  <one-line>
 2. Exit Codes:           <score>/3  <one-line>
 3. Non-Interactive:      <score>/3  <one-line>
 5. Self-Documentation:   <score>/3  <one-line>
 8. Actionable Errors:    <score>/3  <one-line>
 9. Consistent Grammar:   <score>/3  <one-line>
11. Token Efficiency:      <score>/3  <one-line>
                           ─────────
Base:                      <total>/21

Grade: A (18-21) | B (14-17) | C (10-13) | D (6-9) | F (0-5)

═══ Bonus ═══
 4. Idempotency:          <score>/3  <one-line>
 6. Composability:        <score>/3  <one-line>
 7. Dry-Run:              <score>/3  <one-line>
10. Agent Teaching:        <score>/3  <one-line>
                           ─────────
Bonus:                     +<total>/12

═══ Top Improvements ═══
Base items first (affect grade), then bonus items (hints):
  1. What to change, why, and a concrete before/after example
  2. Priority: High (blocks agent usage) / Medium (degrades experience) / Low (forward-looking)

Step 5: Fix Gate

Ask user:

Fix all — apply improvements to source code
Pick items — select specific dimensions to fix
None — use report as reference

References

Read references/dimensions.md for the full scoring checklist per dimension with examples and anti-patterns.

cli-agent-experience