cli-agent-experience
CLI Agent Experience
Language
Match user's language: Respond in the same language the user uses.
Overview
Audit a CLI tool's agent-friendliness against 11 dimensions split into base (determines grade) and bonus (improvement hints, doesn't penalize grade). Produces a scorecard with actionable suggestions.
Input: A CLI tool — source code, installed binary, or GitHub repo.
Output: Base grade + bonus hints + prioritized fix list.
Workflow
Progress:
- Step 1: Identify the CLI tool
- Step 2: Gather evidence
- Step 3: Score each dimension
- Step 4: Present report
- Step 5: Fix gate
Step 1: Identify the CLI
| Input | Action |
|---|---|
| Source code in cwd | Read entry point, help output, output formatting code |
| Installed binary name | Run <tool> --help, sample commands |
| GitHub repo URL | Clone/fetch, read source |
Step 2: Gather Evidence
<tool> --helpand subcommand help (hierarchy depth)<tool> <command> --helpfor 2-3 representative commands- Check for
--json,--quiet,--dry-run,--yes/--forceflags - Read output formatting code in source (stdout vs stderr separation)
- Read error handling code (exit codes, error structure)
- Run a sample command if safe (read-only) to see actual output format
Step 3: Score Each Dimension
Score each 0-3. Read references/dimensions.md for full checklists.
Base dimensions (determine grade, /21):
| # | Dimension | Key question |
|---|---|---|
| 1 | Structured Output | Has --json for composability? Default output token-efficient? |
| 2 | Exit Codes | Meaningful graduated codes, not just 0/1? |
| 3 | Non-Interactive Mode | Can run fully unattended? TTY detection? |
| 5 | Self-Documentation | --help has examples? Required vs optional clear? |
| 8 | Actionable Errors | Error codes + failing input echo + recovery hints? |
| 9 | Consistent Grammar | noun-verb hierarchy? Consistent naming? |
| 11 | Token Efficiency | Output concise by default? Pagination? --limit/--field? |
Bonus dimensions (hints only, /12):
| # | Dimension | Key question | Why bonus |
|---|---|---|---|
| 4 | Idempotency | Safe to retry? Declarative commands? | Domain-dependent (browser/chat inherently side-effectful) |
| 6 | Composability | --quiet? stdin? Batch ops? Filtering? |
Depends on tool scope |
| 7 | Dry-Run | --dry-run with structured diff? |
Domain-dependent (API wrappers, REPL) |
| 10 | Agent Teaching | Skills/docs for agent consumption? | Forward-looking, industry immature |
Domain adaptation: Bonus dimensions have domain-dependent ceilings. When a tool maxes out what's possible in its domain, mark X/3 (domain ceiling). See references/dimensions.md § Domain Adaptation.
Step 4: Present Report
[CLI Agent Experience] <tool-name>
═══ Base Score ═══
1. Structured Output: <score>/3 <one-line>
2. Exit Codes: <score>/3 <one-line>
3. Non-Interactive: <score>/3 <one-line>
5. Self-Documentation: <score>/3 <one-line>
8. Actionable Errors: <score>/3 <one-line>
9. Consistent Grammar: <score>/3 <one-line>
11. Token Efficiency: <score>/3 <one-line>
─────────
Base: <total>/21
Grade: A (18-21) | B (14-17) | C (10-13) | D (6-9) | F (0-5)
═══ Bonus ═══
4. Idempotency: <score>/3 <one-line>
6. Composability: <score>/3 <one-line>
7. Dry-Run: <score>/3 <one-line>
10. Agent Teaching: <score>/3 <one-line>
─────────
Bonus: +<total>/12
═══ Top Improvements ═══
Base items first (affect grade), then bonus items (hints):
1. What to change, why, and a concrete before/after example
2. Priority: High (blocks agent usage) / Medium (degrades experience) / Low (forward-looking)
Step 5: Fix Gate
Ask user:
- Fix all — apply improvements to source code
- Pick items — select specific dimensions to fix
- None — use report as reference
References
Read references/dimensions.md for the full scoring checklist per dimension with examples and anti-patterns.