prompt-reviewer
Prompt Reviewer
Evaluate user prompting quality across AI coding tools. Supports Claude Code, Codex, AMP, OpenCode, and any other tool.
Use This For
- Scoring prompt quality for the current conversation or extracted session history
- Reviewing trends, progress, backfills, and prompt-improvement opportunities
- Answering "how am I doing as a prompter?" with a structured rubric
Do Not Use This For
- Code review, product review, or skill-workflow review
- Generic conversation critique without wanting prompt-quality scoring
- Historical extraction work when the user only wants a quick current-session read
Modes
Modes customize scoring and review behavior for specific teams or projects — axis weights, review cadence, output format preferences, and team context that affects expectations. Stored in modes/ (gitignored, never committed).
How Modes Work
Team-specific configuration (scoring weight adjustments, review cadence, custom session source paths, output format preferences) lives in the client overlay: skillbox-config/clients/{client}/overlay.yaml → auto-generated context.yaml.
Client Config Resolution (Step 0)
- Look for
context.yamlin the working tree (generated from the client overlay) - If found, load team-specific settings (scoring weights, review cadence, etc.) from it automatically
- If not found, ask the user which client to target (or use default scoring)
- If no
skillbox-config/exists, use standard scoring with no adjustments
Workflow Overview
Review flow (default):
- Gather parameters (time range, provider, scope)
- Extract sessions (script for Claude/Codex, manual for others)
- Score prompts (9 axes, 23 points)
- Output quick summary (default) → offer full scorecard
- Save scores to history (with provider/model metadata)
- Show trend if history exists
Trend-only flow (when user asks for "trend", "progress", "history"):
- Run show_trend.py and display results (optionally filter by provider)
- Skip review steps
Backfill flow (when user asks to "backfill", "backfill next", "list weeks", "catch up"):
- Run
list_weeks.pyto see available weeks and backfill status - If user said "backfill next [provider]" → immediately run the full review for that provider's next unreviewed week (no need to ask)
- If user just said "list weeks" → show the table and stop
- After review, ask about purging, then offer to continue with next week
Step 1: Gather Parameters
Ask the user with AskUserQuestion:
Provider (determines extraction method)
- Claude Code (auto-extract from ~/.claude sessions)
- Codex (auto-extract from ~/.codex sessions)
- OpenCode (auto-extract from ~/.local/state/opencode — single batch, no timestamps)
- AMP (manual — score current/pasted conversation)
- Other (manual)
Time range (default: today, for auto-extract providers only)
- Today (recommended)
- Past week
- Past month
- All time
Source (when provider is auto-extractable)
- All three (Claude Code + Codex + OpenCode)
- Both Claude Code and Codex (recommended)
- Claude Code only
- Codex only
- OpenCode only
Model (optional, for trend metadata)
- Ask what model they were using, or infer from context
- Examples: opus, sonnet, gpt-4o, o3, gemini-2.5-pro
Scope (Claude Code only)
- Current project only
- All projects
Step 2: Extract Sessions
For Claude Code / Codex / OpenCode (auto-extract):
python3 {skill_dir}/scripts/extract_sessions.py \
--source {claude|codex|opencode|both|all} \
--since {date} \
--project {project_path_if_filtered} \
--limit 20
Source options: both = claude + codex, all = claude + codex + opencode
OpenCode limitation: OpenCode stores prompts without timestamps or session boundaries. All prompts are returned as a single batch using the file's mtime for date filtering. Backfill-by-week is not meaningful for OpenCode.
For AMP / Other (manual):
No extraction script needed. Instead:
- Ask the user to paste conversation excerpts, share a session file, or point to a log
- If the user says "review this session" or "review my last conversation", score the prompts visible in the current conversation context
- Score whatever user prompts are available — the rubric works the same regardless of source
Step 3: Score Prompts
Read references/scoring-rubric.md for detailed criteria.
Score each session on 9 axes:
| Axis | Max | What to look for |
|---|---|---|
| Clarity | 3 | Named files, explicit goals, validation steps |
| Context & Inputs | 3 | Pasted errors, file paths, reproduction steps |
| Autonomy & Scope | 2 | "Only touch X", "Don't modify Y", stop conditions |
| Constraint Handling | 2 | "Run tests first", "Ask before deploying" |
| Iterative Checkpoints | 2 | "Pause after X", "Show me before continuing" |
| Follow-up Efficiency | 3 | New context vs bare "continue" or "yes" |
| Collaboration Traits | 3 | Constructive feedback, respectful redirects |
| Adaptability | 2 | Pivots based on discoveries |
| Outcome Alignment | 3 | Clear conclusion, explicit handoff |
Composite = sum(scores) / 23
Step 4: Output
Quick summary first → then offer full scorecard.
Quick Summary Template (default)
## Prompt Review - Quick Score
**Composite: X.XX / 1.00**
Sessions: N | Prompts: M
### Top 3 Improvements
1. **[Axis]** (X/max): [Coaching tip]
> Example: "[quoted prompt]"
> Try instead: "[improved version]"
2. ...
3. ...
### What's Working Well
- **[Axis]** (X/max): [Positive observation]
Full Scorecard Template (on request)
## Prompt Review - Full Scorecard
**Composite: X.XX / 1.00**
Sessions: N | Prompts: M | Source: Claude/Codex/Both
### Axis Breakdown
| Axis | Score | Evidence |
|------|-------|----------|
| Clarity | X/3 | "[quoted prompt]" |
| Context & Inputs | X/3 | "[quoted prompt]" |
| Autonomy & Scope | X/2 | "[quoted prompt]" |
| Constraint Handling | X/2 | "[quoted prompt]" |
| Iterative Checkpoints | X/2 | "[quoted prompt]" |
| Follow-up Efficiency | X/3 | "[quoted prompt]" |
| Collaboration Traits | X/3 | "[quoted prompt]" |
| Adaptability | X/2 | "[quoted prompt]" |
| Outcome Alignment | X/3 | "[quoted prompt]" |
### Superlatives
- **Prompt MVP**: "[best prompt]" - [why it worked]
- **Best Save**: "[course correction]" - [what it prevented]
- **Facepalm**: "[worst follow-up]" - [what to do instead]
- **Lowest Hanging Fruit**: [easiest fix with biggest impact]
### Coaching Plan
1. **[Priority improvement]**
> Before: "[current pattern]"
> After: "[improved pattern]"
2. ...
Step 5: Save Scores to History
After outputting the review, persist scores AND qualitative insights:
python3 {skill_dir}/scripts/save_review.py \
--composite {composite} --sessions {N} --prompts {M} \
--clarity {score} --context {score} --autonomy {score} \
--constraints {score} --checkpoints {score} --followup {score} \
--collaboration {score} --adaptability {score} --outcome {score} \
--source {source} --provider {provider} [--model {model}] [--project {path}] \
[--week YYYY-WNN] \
--improvements '[{"axis":"...","score":X.X,"tip":"...","example":"..."},...]' \
--strengths '[{"axis":"...","score":X.X,"observation":"..."},...]'
Improvements (Top 3): Each entry has axis, score, tip (coaching advice), example (quoted prompt).
Strengths (What's Working Well): Each entry has axis, score, observation.
--week (for backfills): Override the week recorded. Without this, saves use today's week.
Always run this after scoring. Scores accumulate in ~/.claude/prompt-review-history.jsonl.
Provider values: claude, codex, amp, opencode, other
Step 6: Show Trend
After saving, if history has 2+ reviews, show the trend:
python3 {skill_dir}/scripts/show_trend.py --weeks 8
Filter to a single provider:
python3 {skill_dir}/scripts/show_trend.py --weeks 8 --provider codex
CSV export (for spreadsheet charting):
python3 {skill_dir}/scripts/show_trend.py --csv --weeks 12
Trend Output Template
## Prompt Score Trend (2025-W01 to 2025-W04)
**Composite:** 0.78 (+0.09) ▃▄▅▆
| Week | Composite | Sessions | Prompts | Trend |
|------|-----------|----------|---------|-------|
| 2025-W01 | 0.69 | 8 | 42 | -- |
| 2025-W02 | 0.72 | 6 | 35 | +0.03 |
| 2025-W03 | 0.74 | 10 | 51 | +0.02 |
| 2025-W04 | 0.78 | 7 | 38 | +0.04 |
### Per-Axis Trends
| Axis | Max | Current | Trend | Spark |
|------|-----|---------|-------|-------|
| Clarity | 3 | 2.5 | +0.3 | ▄▅▅▆ |
| Context | 3 | 2.0 | +0.2 | ▃▃▄▅ |
| ... | ... | ... | ... | ... |
### Movers
- **Most improved:** Clarity (+0.3/3)
- **Needs attention:** Constraints (-0.2/2)
### By Provider
| Provider | Reviews | Avg Score | Spark |
|----------|---------|-----------|-------|
| claude | 12 | 0.76 | ▅▅▆▆ |
| codex | 5 | 0.68 | ▄▅▅ |
| amp | 3 | 0.72 | ▅▅▆ |
The "By Provider" section appears automatically when history contains multiple providers.
Backfill Workflow
To build trend history from past sessions, use the backfill flow.
Quick Backfill (recommended)
When user says "backfill next codex" or "backfill next claude":
- Run
list_weeks.pyto find the next unreviewed week for that provider - Immediately run extract_sessions.py for that week
- Score all prompts, output review, save with
--weekoverride - Ask about purging
- Offer: "Continue with next week?"
No questions needed — just do the next week.
Manual Backfill
If user wants to see status first or pick a specific week:
Step 1: List Available Weeks
python3 {skill_dir}/scripts/list_weeks.py
Or get the full backfill prompt directly:
python3 {skill_dir}/scripts/list_weeks.py --provider codex --prompt
Output shows all weeks with session data, which providers have data, and which have been reviewed:
## Session Weeks Available
| Week | Claude | Codex | OpenCode | Reviewed |
|------|--------|-------|----------|----------|
| 2025-W36 | — | 10 | — | — |
| 2025-W37 | — | 173 | — | — |
| 2025-W38 | — | 131 | — | codex |
| 2026-W05 | 656 | 3 | — | claude, codex |
### Next to Backfill
- **codex**: 2025-W36 (10 sessions)
### Backfill Command
python3 ~/.claude/skills/prompt-reviewer/scripts/extract_sessions.py \
--source codex \
--since 2025-09-01 \
--until 2025-09-07 \
--limit 100
Step 2: Review the Week
Run the suggested extract command, then score ALL prompts (not a sample). Use the standard review flow:
- Extract sessions for the week
- Score every user prompt on all 9 axes
- Output the review with week label (e.g., "2025-W36 Backfill")
- Save scores with
save_review.py
Step 3: Repeat
Run list_weeks.py again to see updated status and get the next week to review.
Backfill Prompt Template
For handing off a single week to another agent:
You are doing a prompt quality backfill review for week {WEEK} ({PROVIDER} sessions).
## Step 1: Extract sessions
Run:
python3 ~/.claude/skills/prompt-reviewer/scripts/extract_sessions.py \
--source {provider} \
--since {start_date} \
--until {end_date} \
--limit 100
## Step 2: Score all user prompts
Read ~/.claude/skills/prompt-reviewer/references/scoring-rubric.md.
Score EVERY user prompt (not a sample) on all 9 axes. Compute averages.
Composite = sum(axis averages) / 23
## Step 3: Output the review
## Prompt Review - {WEEK} ({PROVIDER} Backfill)
**Composite: X.XX / 1.00**
Sessions: N | Prompts: M | Provider: {provider}
### Top 3 Improvements
1. **[Axis]** (X.X/max): [Coaching tip]
> Example: "[quoted prompt]"
2. ...
3. ...
### What's Working Well
- **[Axis]** (X.X/max): [Positive observation]
## Step 4: Save scores (include qualitative insights!)
python3 ~/.claude/skills/prompt-reviewer/scripts/save_review.py \
--composite {composite} --sessions {N} --prompts {M} \
--clarity {avg} --context {avg} --autonomy {avg} \
--constraints {avg} --checkpoints {avg} --followup {avg} \
--collaboration {avg} --adaptability {avg} --outcome {avg} \
--source {provider} --provider {provider} --week {WEEK} \
--improvements '[{"axis":"...","score":X.X,"tip":"...","example":"..."},...]' \
--strengths '[{"axis":"...","score":X.X,"observation":"..."},...]'
IMPORTANT: Use --week {WEEK} to record the original week, not today's date.
## Step 5: Ask about purging old sessions (optional)
After saving, ask the user if they want to delete the reviewed session files to free disk space.
First preview (always use --dry-run first):
```bash
python3 {skill_dir}/scripts/purge_sessions.py \
--provider {provider} --week {WEEK} --dry-run
If user confirms, run without --dry-run:
python3 {skill_dir}/scripts/purge_sessions.py \
--provider {provider} --week {WEEK}
NEVER delete without asking first.
## Scoring Examples
### Clarity
**3/3** - Goal + file + validation:
> "Open `src/auth/login.ts`, confirm the OAuth callback matches the README spec, and stop before making changes."
**1/3** - Vague request:
> "make the transition look better"
**0/3** - No actionable request:
> "continue"
**Coaching**: Instead of "make it better", try "fade the transition over 0.5s with ease-out, and show me before applying"
### Context & Inputs
**3/3** - Error + file + steps:
> "Getting `TypeError: undefined is not a function` at line 42 of `utils.ts` when I run `npm test`. Here's the failing test output: [pasted]"
**1/3** - Missing details:
> "there's a bug in the auth flow"
**Coaching**: Paste the error message, specify which file, include reproduction steps
### Follow-up Efficiency
**3/3** - Adds new context:
> "Good, but the animation is too fast. Slow it to 300ms and add a subtle bounce at the end."
**0/3** - Filler:
> "yes"
**Coaching**: Instead of bare "yes", try "yes, and also check that it works on mobile"
## Scoring Philosophy
**Be constructive, not punitive.** The goal is improvement.
- Quote specific prompts as evidence
- Highlight what's working, not just problems
- Make coaching actionable ("Instead of X, try Y")
- Acknowledge context (quick questions don't need full briefs)
More from build000r/skills
openclaw-client-bootstrap
Build a production-ready OpenClaw client setup for DigitalOcean, Tailscale, Telegram, and SPAPS using a reusable hardened template with read-only defaults and human approval. Use for "set up OpenClaw on a droplet", "create a first claw kit", "bootstrap client box", or approval-gated OpenClaw deployment work.
20unclawg-internet
Run self-service OpenClaw onboarding with browser device auth, agent machine-key provisioning, a soul interview, and discovery-mode setup. Use for "/unclawg-internet", "set me up", "connect to openclaw", "onboard me", "sign up for openclaw", or approval-gated setup.
15domain-scaffolder-backend
|
7unclawg-discover
Run multi-platform customer discovery across Reddit, Hacker News, Twitter/X, and LinkedIn, then output a ranked engagement feed for downstream workflows. Use for "/unclawg-discover", "find customers", "find leads", "find posts to reply to", "build engagement queue", or agent-builder prospecting.
3remotion-best-practices
Best practices for Remotion - Video creation in React. Use when working with Remotion compositions, animations, sequences, or video rendering. Covers project setup for a shared Remotion hub, animation patterns, timing/interpolation, audio, captions, and media handling.
3divide-and-conquer
Decompose complex work into independent parallel sub-agents with no write overlap, synthesize or consume a `WORKGRAPH.md` execution artifact, and launch describe-style worker briefs before review. Use before spawning multiple agents for multi-file, multi-domain, or naturally parallel tasks.
3