audit-clerk-skill
Audit the clerk Skill
Cross-check skills/clerk/ against the actual CLI source in packages/cli-core/ and propose precise edits wherever they have drifted. The binary is the source of truth; the skill is documentation that must track it.
ultrathink on this task. It requires building a full command tree, comparing two representations of it, and making judgment calls about what belongs in the skill vs. in references/*.md vs. in --help. Shallow passes miss drift.
Inputs
- Source of truth:
packages/cli-core/src/commands/**(one directory per top-level command), pluspackages/cli-core/src/cli.ts,cli-program.ts,mode.ts, and anything inpackages/cli-core/src/lib/referenced by commands (runner preference, agent mode, doctor checks, key resolution). - Target:
skills/clerk/SKILL.mdandskills/clerk/references/*.md. - Template markers: the skill uses
{{CLI_VERSION}}placeholders substituted at install time byclerk skill install. Preserve them; do not expand.
Workflow
1. Inventory the CLI
Walk packages/cli-core/src/commands/ and build a structured inventory. For each top-level command and subcommand capture:
- Full path (e.g.
clerk config patch,clerk api ls). - Purpose (one line, from the command's description / help string in source).
- All flags with short form, type, default, and whether they are destructive.
- Exit codes beyond the default if the command overrides them.
- Agent-mode branches: anything gated on
isAgentMode()/CLERK_MODE/ TTY detection. Note what changes (prompts skipped, JSON forced,--fixignored, handoff printed, etc.). - Whether the command mutates remote state (candidate for
--dry-run/--yesguidance).
Read the per-command README.md (packages/cli-core/src/commands/<name>/README.md) as a secondary source. These are dev-facing (documented API endpoints hit, mocked surfaces, internal contracts per .claude/rules/commands.md). Use them to:
- Flag commands or sub-paths marked mocked/stubbed (blockquote at top of the README). The skill should not document these as production-ready.
- Cross-check Clerk API endpoint claims in
references/recipes.md.
Do not propose bundling the READMEs into skills/clerk/references/ (symlinks or text imports). They ship internal detail agents do not need, and inflate the compiled binary. They are a reference for the audit, not for the skill.
Also capture cross-cutting behavior:
- Runner preference logic (
preferredRunnerinlib/): the lockfile -> runner mapping that the skill's Invoking-the-CLI table must mirror. - Auth / key resolution order and
--app/--instanceresolution (cross-refreferences/auth.md). doctorcheck list and the shape of--jsonoutput (cross-refreferences/agent-mode.md).clerk init --promptoutput (the skill claims it is an agent handoff, not an integration guide; verify).
Don't memorize output. Prefer reading the source directly over running the binary, since the binary in the worktree may be stale or unbuilt. When uncertain about runtime behavior, fall back to the tests in packages/cli-core/src/**/*.test.ts or packages/cli-core/src/test/.
2. Extract the skill's current claims
Read skills/clerk/SKILL.md and each file under skills/clerk/references/. Extract every concrete claim:
- Every command mentioned in the "Core commands at a glance" table and the Invoking-the-CLI table.
- Every flag called out by name.
- Every agent-mode behavior bullet.
- Every exit code / error-format claim.
- Every cross-reference to a
references/*.mdfile.
3. Diff source vs. skill
Produce a structured diff with four buckets:
- Missing from skill: commands, subcommands, flags, or behaviors that exist in the source but are not mentioned anywhere in the skill. (Example seen historically:
completion,deploy,open,skill,switch-envwere absent from the core-commands table.) - Stale in skill: claims in the skill that no longer match source - renamed flags, changed defaults, removed commands, shifted exit codes, agent-mode branches that were reworked.
- Thin in skill: commands mentioned but under-specified relative to their real complexity or footgun surface (destructive mutations without
--dry-runcallouts, flags with non-obvious interactions, agent-mode divergence not documented). - Over-specified: details the skill encodes that
clerk <cmd> --helpor the referenced files already cover better. Candidates for deletion to stay under the 500-line budget.
For every entry in buckets 1-3, cite the source file and line and the target skill location.
4. Decide placement
Not every gap belongs in SKILL.md. Apply this routing:
- Core loop / mental model / safety: stays in
SKILL.md. - Per-command flag tables, recipes, edge cases: belong in
references/recipes.mdor a new reference file. - Agent-mode branches: belong in
references/agent-mode.md;SKILL.mdgets a summary bullet only. - Auth / targeting / key resolution:
references/auth.md.
Prefer --help over enumeration. clerk <command> --help is generated from the Commander tree and can't drift; it also renders each command's setExamples([...]) block. Before expanding a flag table, ask whether the agent could just run --help once. Route as follows:
- Flags with non-obvious interactions, destructive semantics, or agent-mode divergence: document in the skill.
- Flags that are self-explanatory from
--help: do not enumerate. If the skill currently lists them and there's no hidden gotcha, propose apolishedit that removes the enumeration and points at--help. - Example syntax: prefer a sentence like "see
clerk <cmd> --helpfor examples" over copying examples into the skill (which risks drifting from the sourcesetExamples([...])).
Cuts that shrink the skill toward --help are first-class proposals, not optional cleanup. A smaller skill that routes agents to --help ages better than a larger one that tries to mirror every flag.
If a new reference file is warranted (e.g. a references/commands.md table), propose it explicitly rather than bloating SKILL.md.
5. Propose edits
Emit a review-ready proposal. For each change:
- Path (
skills/clerk/SKILL.mdorskills/clerk/references/<file>.md). - Why (source citation:
packages/cli-core/src/commands/<cmd>/<file>.ts:<line>). - Either a unified diff (preferred) or a before/after block for prose sections.
- Severity:
drift(factually wrong today),gap(missing coverage),polish(clearer wording, better placement, or cuts that route the agent to--helpinstead of duplicating it).
Group the proposal by file. Do not touch {{CLI_VERSION}} markers. Do not rewrite sections that are still accurate just because they are near a change.
6. Apply or hand back
Default: present the proposal and stop. The user reviews, then says apply.
If invoked as /audit-clerk-skill --apply, apply drift and gap edits directly but still list polish suggestions for review. Run bun run format after applying so markdown tables and frontmatter match repo style.
Guardrails
- Never invent flags. If a flag appears in a test but not in the command's argument parser, treat it as test-only and flag it for human review.
- Preserve voice. The existing skill is terse and third-person; match it. No first- or second-person drift.
- No em-dashes anywhere in proposals (repo style rule).
- Respect the template.
{{CLI_VERSION}}stays verbatim. The Invoking-the-CLI runner table is generated frompreferredRunnerlogic; if that logic changes, update the table, otherwise leave it alone. - Stay within the 500-line guidance for
SKILL.md. When the budget is tight, move content toreferences/rather than deleting it outright.
Output shape
Return the proposal as:
# clerk skill audit — <YYYY-MM-DD>
## Summary
<counts per bucket, one-line headline of the biggest drift>
## skills/clerk/SKILL.md
### <section name>
- [drift|gap|polish] <one-line description>
- source: packages/cli-core/src/commands/<...>:<line>
- <diff or before/after>
## skills/clerk/references/<file>.md
...
## New files (if any)
...
## Open questions
<things that need human judgment before applying>
Keep it skimmable. The user should be able to approve or reject each entry individually.