Retrospective — Developer-AI Workflow Analysis

Analyze Claude Code session logs to understand how the developer-AI collaboration is working — what went well, what didn't, and what can be improved. Produce actionable suggestions for new skills, subagents, slash commands, hooks, and workflow changes based on actual usage patterns.

This is a true retrospective in the agile sense: looking back at past sessions to improve future ones. The analysis examines real session logs, not hypothetical best practices. Each retrospective builds on previous ones — tracking whether past recommendations were acted on and whether the collaboration is actually improving.

Dimensions

Five dimensions, each targeting a different axis of workflow improvement:

Dimension	Question It Answers	Reference
What Went Well	Which interactions were efficient, successful, and worth repeating? What strengths can be applied to recurring problems?	`references/success-patterns.md`
What Didn't Go Well	Where did the collaboration break down, waste time, or produce poor results? What is the root cause?	`references/failure-patterns.md`
Skill Opportunities	What repeated requests or workflows should become reusable skills or slash commands?	`references/skill-opportunities.md`
Workflow Optimization	How can subagents, hooks, and automation reduce manual effort?	`references/workflow-optimization.md`
Collaboration Antipatterns	What common developer-AI pitfalls are showing up in these sessions?	`references/collaboration-antipatterns.md`
Session Inventory	What happened in each session? Which sessions need attention?	`references/session-inventory.md`

Subcommands

Command Pattern	Scope	Reference
`retrospective` / `retrospective all`	All five dimensions	All references
`retrospective wins` / `retrospective good`	What Went Well	`references/success-patterns.md`
`retrospective problems` / `retrospective bad`	What Didn't Go Well	`references/failure-patterns.md`
`retrospective skills`	Skill & Slash Command Opportunities	`references/skill-opportunities.md`
`retrospective workflow` / `retrospective automation`	Workflow Optimization	`references/workflow-optimization.md`
`retrospective antipatterns`	Collaboration Antipatterns	`references/collaboration-antipatterns.md`

| retrospective inventory / retrospective sessions | Session catalog only | references/session-inventory.md |

When no subcommand is specified, default to all dimensions plus session inventory. When a dimension is mentioned by name (even without "retrospective"), match it.

Execution Principle: One Script Per Agent, Zero Raw Bash Calls

MANDATORY. Every agent (main context and every subagent) MUST use a dedicated shell script for ALL Bash operations. No agent may run raw Bash commands like grep, find, jq, cat, rg, or any data extraction directly. Every command goes into the script.

Each Bash tool call requires user approval. Running dozens of individual commands causes confirmation fatigue — the exact antipattern this skill detects. The fix: each agent creates one script and only ever runs that script. When the agent needs different data, it edits the script and re-runs it.

Session directory

Each retrospective run writes all scripts and intermediate output to a session-scoped directory to avoid write conflicts when multiple agents run in parallel:

/tmp/retro-$$ (where $$ is the shell PID of the main context)

The main context must create this directory (mkdir -p /tmp/retro-$$) before launching any subagents, and pass the path to every subagent in its prompt.

How every agent must work

Write a shell script using the Write tool (no approval needed): /tmp/retro-$$/<agent-name>.sh
Run it: bash /tmp/retro-$$/<agent-name>.sh (one approval)
Need different data? Edit the script with the Edit tool (no approval), then re-run bash /tmp/retro-$$/<agent-name>.sh (one approval). Never create a new Bash call.

Script assignments (one per agent, no sharing)

Agent	Script path
Main context (step 1)	`/tmp/retro-$$/analyze.sh`
Inventory subagent (step 4)	`/tmp/retro-$$/inventory.sh`
Success patterns subagent (step 5)	`/tmp/retro-$$/success-patterns.sh`
Failure patterns subagent (step 5)	`/tmp/retro-$$/failure-patterns.sh`
Skill opportunities subagent (step 5)	`/tmp/retro-$$/skill-opportunities.sh`
Workflow optimization subagent (step 5)	`/tmp/retro-$$/workflow-optimization.sh`
Collaboration antipatterns subagent (step 5)	`/tmp/retro-$$/collaboration-antipatterns.sh`
Feedback loop subagent (step 8)	`/tmp/retro-$$/feedback.sh`

What goes in the script

Everything. File discovery, JSONL parsing, grep/rg searches, jq queries, line counting, pattern extraction — all of it. The script outputs structured results to stdout. The agent reads the output, thinks about it, edits the script for the next query, re-runs.

Violations (NEVER do these)

Running grep, rg, find, jq, cat, wc, sort, or awk as direct Bash calls
Reading log files one at a time with individual Read or Bash calls
Creating multiple different Bash commands instead of editing the script
Asking the user to run commands manually ("run ls", "run wc -l", "run this grep")
Any Bash tool call that is not executing the agent's own script

Workflow

1. Locate and Read Session Logs

Session logs are JSONL files stored at: ~/.claude/projects/<project-path-encoded>/<session-id>.jsonl

To find logs for the current project:

Encode the current working directory path (replace / with -, strip leading -)
Look in ~/.claude/projects/<encoded-path>/
Each .jsonl file is one session

Read all session logs from the last 3 months. Each .jsonl file has a modification timestamp — include every file modified within the last 90 days. Each line in a file is a JSON object representing one event.

Use the single-script approach (see "Execution Principle" above). Write the analysis script to /tmp/retro-$$/analyze.sh, execute it once, then edit and re-run it as needed to drill into specific patterns. Do not read log files one at a time.

Key event types to extract:

"type": "user" — user messages (requests, corrections, interruptions, feedback)
"type": "assistant" — Claude's responses (tool calls, text, thinking)
"type": "progress" — hook events, subagent events
Tool use results — success/failure of each tool call

Key fields to examine:

message.content — what the user asked or what Claude said
Tool use name and input — which tools were called and how
"is_error": true — tool calls that were rejected or failed
User messages containing corrections ("no", "not that", "instead", "wrong")
Session duration and turn count — session length and density
"isSidechain": true — branched/abandoned conversation paths

Emotional signal detection: Pay attention to user frustration and satisfaction markers in messages:

ALL CAPS or exclamation-heavy language indicating frustration
Increasingly terse messages (detailed instructions degrading to one-word corrections)
Explicit frustration ("this is frustrating", "why can't you", "I give up")
Resignation signals (user does the task themselves, or says "never mind", "forget it")
Sarcasm or exasperation ("sure, whatever", "fine")
Rapid topic switches (user abandons a task without resolution)
Positive signals: "perfect", "exactly", "nice", "that's great" — mark what worked

A user who silently gives up is a worse outcome than one who corrects Claude five times and gets the right result. Frustration and resignation are the highest-priority signals because they represent problems the user stopped trying to fix.

2. Load Dimension References

Before analyzing, read the reference file(s) for the requested dimension(s):

references/session-inventory.md — session cataloging methodology and output format
references/success-patterns.md — patterns of effective collaboration
references/failure-patterns.md — patterns of wasted effort and breakdowns
references/skill-opportunities.md — detecting automatable patterns
references/workflow-optimization.md — subagents, hooks, automation
references/collaboration-antipatterns.md — known developer-AI pitfalls

For a full retrospective (retrospective all), read all six.

3. Load Previous Retrospective Reports

Check for previous retrospective reports in docs/retrospective/. Read the most recent 3 reports (or fewer if fewer exist). These are needed for the feedback loop in step 8.

If no previous reports exist, this is the first retrospective — skip step 8 and note this in the output.

4. Build Session Inventory

Launch a dedicated inventory subagent before the dimension subagents. This subagent catalogs every session from the last 3 months, producing the session inventory that serves as the foundation for the pattern analysis.

Read references/session-inventory.md for the complete cataloging methodology, classification heuristics, and output format.

The inventory subagent must:

Read references/session-inventory.md for methodology and output format.
Write /tmp/retro-$$/inventory.sh using the Write tool. This script handles ALL data extraction: file discovery, JSONL parsing, metadata extraction, topic detection, status classification.
Run bash /tmp/retro-$$/inventory.sh (one Bash approval). Analyze the output.
Need different data? Edit the script with the Edit tool, re-run it. Never run raw grep/find/jq/cat as separate Bash calls — everything goes in the script.
Write the structured inventory output to /tmp/retro-$$/inventory-output.md.
Never ask the user to run shell commands manually. If this appears in output, discard and re-run the subagent.

This step must complete before step 5 launches. The inventory output is passed to each dimension subagent as input context so they can reference specific sessions in their findings.

Output: The session inventory (summary table + per-session detail blocks + sessions requiring attention) as specified in references/session-inventory.md.

5. Analyze Session Patterns — One Subagent Per Dimension

Launch one subagent per dimension. Do not split work by time period or session count — split by dimension. Each subagent analyzes ALL session logs from the last 3 months but focuses exclusively on its assigned dimension.

For a full retrospective, launch 5 subagents in parallel:

Subagent	Dimension	Reference File	Script Path
1	What Went Well	`references/success-patterns.md`	`/tmp/retro-$$/success-patterns.sh`
2	What Didn't Go Well	`references/failure-patterns.md`	`/tmp/retro-$$/failure-patterns.sh`
3	Skill Opportunities	`references/skill-opportunities.md`	`/tmp/retro-$$/skill-opportunities.sh`
4	Workflow Optimization	`references/workflow-optimization.md`	`/tmp/retro-$$/workflow-optimization.sh`
5	Collaboration Antipatterns	`references/collaboration-antipatterns.md`	`/tmp/retro-$$/collaboration-antipatterns.sh`

Each subagent must follow the script-only rule:

Read its assigned reference file for detection heuristics and pattern catalogs.
Read the session inventory from /tmp/retro-$$/inventory-output.md (produced in step 4) to understand the full session landscape and reference specific sessions by ID.
Write its dedicated script (e.g., /tmp/retro-$$/success-patterns.sh) using the Write tool. The script must contain ALL grep, find, jq, cat, rg, awk, wc, and sort operations — no raw Bash calls allowed.
Run bash /tmp/retro-$$/success-patterns.sh (one Bash approval). Analyze the output.
Need different data? Edit the script with the Edit tool, re-run it. Repeat as needed. The agent may ONLY call Bash to run its own script — never for individual commands.
Return all significant findings, each backed by paragraph-level evidence with quoted session content. A finding is significant if it appeared in 2+ sessions or had notable impact in a single session. Reference specific session IDs from the inventory when presenting evidence. Each finding must include session ID(s), quoted messages or tool calls, turn counts, and full reasoning — not one-liner summaries.
If the subagent output asks the user to run commands, treat it as invalid and regenerate with stricter instructions.

What each subagent looks for:

Subagent 1 — What Went Well:

Tasks completed efficiently (few turns, no corrections)
Effective tool usage (right tool, right approach)
Good delegation patterns (subagents used well)
Successful first attempts (no retries needed)
Positive emotional signals (user satisfaction, explicit praise)

Subagent 2 — What Didn't Go Well:

Tasks requiring many corrections or restarts
Misunderstandings that wasted multiple turns
Tool call failures or rejections
Abandoned sidechains (user said "no" and redirected)
Context loss (user had to repeat information)
Frustration signals (terse corrections, resignation, task abandonment)

Subagent 3 — Skill Opportunities:

Same type of request made multiple times across sessions
Multi-step manual workflows that follow a predictable pattern
Complex prompts that could be templated
Repeated setup or configuration instructions

Subagent 4 — Workflow Optimization:

Tasks done in main context that should be delegated to subagents
Manual checks that could be hooks
Sequential work that could be parallelized
Missing tools or plugins that would help

Subagent 5 — Collaboration Antipatterns:

Under-specification leading to wrong approach
Over-correction loops (fix → new problem → fix → ...)
Premature implementation (coding before understanding)
Context overload (too much in one session)
Emotional escalation (frustration building across turns)

For a single-dimension retrospective (e.g., retrospective skills), launch only the relevant subagent.

6. Generate Insights — Ask "Why"

This step is critical. Do not skip it.

For each pattern identified in step 5, go beyond observation and ask why the pattern exists. The goal is root cause analysis, not symptom listing. Without this step, the retrospective produces shallow observations that recycle from one retrospective to the next — the single most common retrospective failure mode.

For each finding, answer:

Why does this pattern recur? Is it a missing skill, a bad habit, an architectural constraint, a tooling gap, or a communication mismatch?
Is this systemic or a one-off? Systemic issues need structural fixes (skills, hooks, CLAUDE.md changes). One-offs need nothing.
What strength can address this weakness? Connect success patterns to failure patterns. If the developer communicates effectively in domain X, can that same approach fix recurring miscommunication in domain Y? This solution-focused framing deploys existing strengths against current problems rather than inventing entirely new behaviors.

7. Score Each Dimension

For each dimension analyzed, assign a score from 1–5:

Score	Label	Meaning
1	Poor	Major problems in this area, significant improvement needed.
2	Fair	Notable issues that regularly cause friction.
3	Okay	Some issues but generally functional. Room for improvement.
4	Good	Working well with only minor improvement opportunities.
5	Excellent	This area is highly effective. Keep doing what you're doing.

8. Feedback Loop — Compare Against Previous Retrospectives

Launch a subagent to perform this comparison. The subagent MUST follow the script-only rule: write /tmp/retro-$$/feedback.sh using the Write tool, run bash /tmp/retro-$$/feedback.sh (one Bash approval), edit and re-run as needed. No raw Bash calls — all grep/find/jq/cat operations go in the script. The subagent should:

Read the current analysis (from steps 5-7).
Read the previous retrospective reports (loaded in step 3).
For each recommendation from previous reports, determine its status:
- Resolved: The issue no longer appears in the current analysis. The subagent must verify this by checking session logs for concrete evidence that the fix is in place (e.g., if a previous retro recommended creating a hook, check whether hook events appear in the session logs from the last 3 months; if it recommended a skill, check whether the skill is being invoked; if it recommended a CLAUDE.md change, check whether the CLAUDE.md contains the recommended addition).
- Recurring: The same or a similar issue appears again in the current analysis. This means the recommendation was not acted on, or the action taken was insufficient.
- Unknown: Cannot determine from current session data whether the recommendation was addressed. Needs manual verification.
Produce the Action Tracking section (see step 9 output format).
Never ask the user to run shell commands manually. If this appears, regenerate the feedback output.

If no previous retrospective reports exist, skip this step.

9. Merge Results Into Report

CRITICAL: Keep the analysis methodology unchanged, but change the presentation. The final report must be recommendation-first, scannable in 30 seconds, and use fixed fields per block.

Do not dump raw subagent prose into the main body. Synthesize findings into a fixed structure, then move detailed evidence into a numbered appendix.

Before you draft any section of the report, build a canonical recommendation set from the merged findings. This is the source of truth for BOTH the saved markdown report and the immediate user-facing explanation after the file is written.

Each recommendation in this canonical set must include:

Action
Effort
Impact
Status (new or recurring)
Problem addressed
Why it matters
Expected outcome
Target artifact
Exact content to apply
Verification
Evidence refs

Rules:

Every recommendation that appears in the report must come from this canonical set.
The user-facing explanation after writing the report must explain the same recommendations in the same priority order.
Do not invent new recommendations in the post-write explanation.
If the recommendations are clear in chat but hard to see in the report, the report structure is wrong and must be fixed before writing.

Required Output Order (fixed)

Recommendations Table (FIRST section in the report)
Recommendation Blocks (numbered, fixed fields)
Concrete Implementation (copy-paste artifacts)
Action Tracking (resolved/recurring/unknown from step 8)
Dimension Scorecard (from step 7)
Summary (short)
Evidence Appendix (numbered, after recommendations)

Recommendations Table (FIRST)

Start the report with a table:

Rules:

Status must be new or recurring for every recommendation.
Mark recurring when step 8 found a materially similar unresolved recommendation.
Mark new when no prior equivalent recommendation exists.
Sort by highest impact, then lowest effort.

Recommendation Blocks (fixed fields)

After the table, include one block per recommendation in strict field order:

Recommendation #
Action:
Effort:
Impact:
Status: (new or recurring)
Problem addressed: (1-2 sentences)
Why it matters: (1-3 sentences, quantify when possible)
Expected outcome:
Evidence refs: (E01, E07, etc.; no inline quote walls)

Concrete Implementation (copy-paste)

Provide an implementation block for every recommendation with exact text/config:

Recommendation #
Target artifact: (file path, hook config path, skill file, etc.)
Exact content to apply: (full markdown/json/yaml/code block ready to copy-paste)
Verification: (one concrete check command or acceptance condition)

This section must be immediately usable without interpretation.

Action Tracking

Use the feedback-loop output from step 8, but keep it concise and structured:

Completion rate percentage
Resolved items
Recurring items
Unknown items

Dimension Scorecard

Include the 1-5 scores from step 7 as a table. Keep to one compact table.

Evidence Appendix (numbered, after recommendations)

All detailed analysis lives here, after the recommendation sections.

Rules:

Assign evidence IDs E01, E02, ... and reference these IDs from recommendations.
Put session inventory details, quotes, long reasoning, and per-dimension findings in this appendix only.
Do not repeat full evidence inline in recommendation blocks.

Each appendix entry must use fixed fields:

Evidence ID:
Dimension: (inventory, success, failure, skill, workflow, antipattern, feedback)
Sessions:
Observation:
Supporting evidence: (quoted snippets/tool traces)
Root cause:
Impact/cost:
Related recommendation(s): (#1, #3, etc.)

Summary

One short paragraph: biggest theme, highest-leverage change, and what to preserve.

10. Write Report to Persistence Store

Save the merged report to docs/retrospective/YYYY-MM-DD-v1.md. Increment the version number if a file for today already exists. This is the authoritative record for the feedback loop — future retrospectives read these files.

11. Explain Recommendations to the User

After writing the report, immediately explain the recommendations to the user using the same canonical recommendation set from step 9. This explanation is mandatory. Do not merely say "report written" and do not defer explanation to the markdown file.

Required structure for the user-facing explanation:

A very short headline summary (2-4 bullets max): completion rate, overall score, wasted-turn estimate, number of recommendations.
A numbered recommendation walkthrough in priority order.
Only after the walkthrough, offer implementation.

For each recommendation in the walkthrough, include:

the recommendation name/action
what problem it addresses
why it matters (with quantified impact/cost when available)
how the proposed fix changes the workflow

Rules:

Explain every recommendation unless the user explicitly asked for only the top N.
Do not tell the user to open or read the file instead of explaining it.
Do not reduce the explanation to only the top 3 when 10 recommendations were produced.
Do not ask "Want me to implement any of these now?" until after the full walkthrough.
The wording may be simpler than the report, but the recommendation ordering and core claims must match the canonical recommendation set.

Pragmatism Guidelines

These are guidelines, not laws. Apply judgment:

Evidence over theory. Every finding should reference specific session evidence. Don't suggest improvements based on hypothetical problems.
Quantity matters. A pattern seen once is an anecdote. A pattern seen in 2+ sessions or with notable impact in a single session is a real finding.
Context matters. Some "inefficiencies" are exploratory work that shouldn't be optimized away. Learning and experimentation are valuable.
User corrections are gold. Every time the user corrected Claude, there's a workflow improvement hiding. These are the highest-signal data points.
Emotional signals are platinum. Frustration, resignation, and silent abandonment are even higher-signal than explicit corrections — they indicate problems the user stopped trying to fix.
Always ask why. An observation without a root cause is incomplete. The "Generate Insights" step (step 6) is what separates useful retrospectives from recycled complaint lists.
Connect strengths to weaknesses. The most actionable fix is often applying something that already works well to a problem area. This solution-focused approach has higher follow-through than inventing new behaviors.
Prefer structural fixes over behavioral ones. "Remember to be more specific" fails. A CLAUDE.md rule or a skill that enforces specificity succeeds.
Don't over-automate. Not every repeated task should become a skill. Some tasks benefit from human judgment each time.
Suggest concrete artifacts. Don't say "create a skill for X" — sketch the skill's YAML frontmatter and purpose. Don't say "add a hook" — show the hook config.
Keep the report and the explanation in sync. The markdown file is the durable artifact, but the immediate user reply is the first thing the user reads. They must tell the same recommendation story in the same order.

Example Interaction

User: retrospective

Claude:

Reads all six reference files and loads previous retrospective reports
Launches inventory subagent — catalogs every session, classifies completion status, produces per-session detail blocks with evidence paragraphs
Launches 5 dimension subagents in parallel — each produces comprehensive findings with paragraph-level evidence and quoted session content
For each pattern, asks "why" — root causes, connects strengths to weaknesses
Launches feedback subagent to compare against previous retrospective
Builds a canonical recommendation set, then builds a recommendation-first report with fixed blocks — recommendations and copy-paste implementation first, numbered evidence appendix last
Writes the report to docs/retrospective/YYYY-MM-DD-v1.md
Explains the same recommendations back to the user in a numbered walkthrough, so the report and the user-facing summary stay aligned

retrospective

Retrospective — Developer-AI Workflow Analysis

Dimensions

Subcommands

Execution Principle: One Script Per Agent, Zero Raw Bash Calls

Session directory

How every agent must work

Script assignments (one per agent, no sharing)

What goes in the script

Violations (NEVER do these)

Workflow

1. Locate and Read Session Logs

2. Load Dimension References

3. Load Previous Retrospective Reports

4. Build Session Inventory

5. Analyze Session Patterns — One Subagent Per Dimension

6. Generate Insights — Ask "Why"

7. Score Each Dimension

8. Feedback Loop — Compare Against Previous Retrospectives

9. Merge Results Into Report

Required Output Order (fixed)

Recommendations Table (FIRST)

Recommendation Blocks (fixed fields)

Concrete Implementation (copy-paste)

Action Tracking

Dimension Scorecard

Evidence Appendix (numbered, after recommendations)

Summary

10. Write Report to Persistence Store

11. Explain Recommendations to the User

Pragmatism Guidelines

Example Interaction

More from florianbuetow/claude-code

ssrf

spec-writer

solid-principles

business-logic

file-upload

pasta-risk