stories

Installation

SKILL.md

Stories

Sharpen a single work item into a story seed — a structured artifact precise enough that a downstream specification process can start investigating the solution without re-deriving the problem.

The seed captures WHAT and WHY at the product level. It does NOT make technical architecture decisions, design APIs, choose data models, or specify implementation approaches — those belong to the specification process.

Your stance

You are a proactive co-driver — not a passive collector. You have opinions, push for precision, and challenge assumptions rather than accepting them.
The user is the decision-maker and vision-holder. Create space for their domain knowledge — customer conversations, product vision, strategic context.
You enforce rigor: verify claims against the codebase, probe for unstated dimensions, challenge vague invariants, surface implicit non-goals. This is your job even when the user doesn't ask.
Before asking the user anything, check whether the answer is findable through investigation. Only surface questions that genuinely require human judgment or domain knowledge that exists only in their head.

Load (on entry): Load /structured-thinking skill. If the skill is not available (Skill tool returns an error), stop and inform the user: "The /stories skill requires /structured-thinking for shared vocabulary (SCR format, disambiguation protocol, value dimensions, decision taxonomy). Cannot proceed without it."

After loading, find the skill's reference files (use Glob for **/structured-thinking/references/*.md). Read:

references/challenge-posture.md (co-driver stance, anti-sycophancy rules, investigate-vs-judgment boundary)
references/extraction-protocol.md (three probes, Items table schema + lifecycle, carry-forward discipline)
references/session-discipline.md (investigation escalation ladder, multi-answer parsing, progress scorecard, interaction cadence)

What this skill does NOT do

Technical architecture decisions — no data models, API shapes, auth designs, system design, or failure mode analysis. The seed captures boundaries; the specification process investigates solutions.
Decompose a bet into multiple stories — that's a project decomposition concern. This skill takes ONE story and sharpens it.
Produce implementation-level user stories — no spec.json, no US-NNN format, no code-level acceptance criteria. The seed is a human-readable product artifact.

Workflow

Create workflow tasks (first action)

Before starting any work, create a task for each phase using TaskCreate with addBlockedBy to enforce ordering.

Stories: Assess input and route
Stories: Scaffold — create living document infrastructure
Stories: Ground (if bare input)
Stories: Sharpen — work through completeness criteria with living document tracking
Stories: Validate and finalize

Mark each task in_progress when starting and completed when done. On re-entry, check TaskList first and resume from the first non-completed task.

If input is rich (from a project decomposition output with dimensional value and constraints), mark task #3 as deleted — grounding is not needed.

Phase 1: Assess input and route

Determine what the user brought and how to proceed.

Input quality heuristic: Input is "rich" when it includes multi-dimensional value articulation and explicit constraints (typically from a project decomposition output or a detailed brief). All other input is "bare."

Routing check — detect wrong-tool scenarios:

The user describes a strategic question about which bets to pursue → suggest a portfolio-level strategic skill instead. This skill sharpens one story, not portfolio decisions.
The user describes a bet that needs decomposition into multiple stories → suggest a project decomposition skill instead. This skill takes one story, not a bet.
The user is already asking technical architecture questions ("should we use REST or GraphQL?") → suggest a specification skill instead. This skill captures the problem and boundaries, not the solution.

If input comes from a PROJECT.md: Read the upstream Items table. Respect Decided (Locked) items as constraints — do not reopen without new evidence. Carry Assumed items into this story's Items table for early verification. Note Parked items for context. See extraction-protocol.md §5 (carry-forward discipline).

If the input is a story: proceed. Determine bare vs rich and move to Phase 2 (Scaffold) → Phase 3 or Phase 4.

Phase 2: Scaffold

Create the living document infrastructure before any substantive work. This ensures event-driven writing has a home from the first finding.

Create the story directory: <stories-dir>/<story-name>/
Create STORY.md with section headers from the output template (empty — populated progressively through Phases 3-4)
Create evidence/ directory
Create meta/_changelog.md with initial entry: date, story description, input source (upstream PROJECT.md reference if applicable)

Single-artifact invariant. This skill produces exactly ONE STORY.md per session. Sibling STORY.md files are never created. If the input contains multiple stories, refine the most important one here and surface the others as candidates for separate /stories runs.

If carrying forward items from an upstream PROJECT.md, populate the Items table with carried items (Decided → constraint context in Notes, Assumed → verify early, Parked → awareness).

Where to save:

Priority	Source
1	User says so in the current session
2	Env var `CLAUDE_STORIES_DIR` (check for `resolved-stories-dir` in the SessionStart hook output at the top of your conversation context; if not present, the hook may not be configured — fall back to priority 3-5)
3	AI repo config (`CLAUDE.md`, `AGENTS.md`, etc.) declares `stories-dir:`
4	Default (in a repo): `<repo-root>/stories/<story-name>/STORY.md`
5	Default (no repo): `~/.claude/stories/<story-name>/STORY.md`

The directory name uses kebab-case semantic naming (e.g., stories/add-cli-agent-runner/).

Phase 3: Ground (bare input only)

When the user arrives with a bare idea (no project context, no structured upstream), build grounding before sharpening.

Dispatch /worldmodel --depth light as a subagent: spawn a general-purpose subagent via the Agent tool. Include --depth light in the subagent's prompt text:

"Before doing anything, load /worldmodel skill. Run with --depth light on [topic]. [Include the story description and any user-provided links or context.]"

Light mode runs all channels at reduced depth — inline code scanning, 2 web probes, report catalogue scan, OSS README. This surfaces: what currently exists in the codebase, the connection landscape, 3P context, and dimensional awareness.

Read the worldmodel output. Use it to probe the user with informed questions rather than blank interrogation:

"Based on the codebase, this connects to [X]. Is that the broader context?"
"The existing reports suggest [Y]. Does that match your understanding?"
"I found [Z] adjacent work. What depends on this?"

Write findings to evidence/ immediately. Grounding findings are facts — they don't need user validation to be captured. Use frontmatter to distinguish raw proof from synthesized understanding (see extraction-protocol.md §8).

If /worldmodel is unavailable: Fall back to direct investigation — Read/Grep/Glob for codebase scanning, WebSearch for web context, read the reports catalogue manually. Note in the seed: "automated grounding not performed — manual investigation used."

Proceed to Phase 4 once the problem space is sufficiently explored — the 5-probe stress test (Phase 4, criterion #1) is the exit gate for grounding.

Phase 4: Sharpen — work through completeness criteria with living document tracking

Read these reference files from /structured-thinking (already loaded on entry):

references/disambiguation-protocol.md — the 5-step protocol (challenge/probe/surface/explore/verify) applied throughout
references/problem-framing.md — SCR format and 5-probe stress test for criterion #1
references/value-dimensions.md — dimension-trace diagnostic and intersection reasoning for criterion #2
references/decision-taxonomy.md — temporal non-goals and confidence vocabulary for criteria #3-#4 and #7

Read references/quality-examples.md from this skill's directory for incorrect/correct pairs on the highest-risk criteria. Use these to calibrate your quality enforcement before working through the criteria.

Event-driven writing

From this phase onward, write to artifacts as items surface and resolve — not at the end.

Investigation produces findings → write to evidence/ immediately. Facts don't need user validation.
Load-bearing content gate → present to user, do not write to STORY.md. If agent-inferred content hits any load-bearing criterion (creates precedent, customer-facing, foundational tech, one-way door, cross-cutting, creates divergence) or requires human judgment (product vision, priority, risk appetite, scope), present it in conversation with supporting evidence. Write to STORY.md only after the user explicitly confirms. Agent conclusions with product or architectural consequences are synthesis, not evidence — regardless of confidence.
User confirms a decision → update Items table (status → Decided, add firmness + rationale in Notes). Update the relevant STORY.md section.
Decision changes → cascade analysis. Trace dependents: which other items or sections does this decision affect? Update affected entries. Log the cascade in meta/_changelog.md.
Completed criterion → write to STORY.md section. After user confirmation on synthesis (problem statement, value articulation, invariants), update the corresponding section.

Systematic extraction

Apply the three probes from extraction-protocol.md at story level as items surface during sharpening:

Walk through the problem statement, each value dimension, each constraint, each AC — what's uncertain? Assumed but unverified?
Where do goals conflict with constraints? Where does user need diverge from technical feasibility?
What failure modes are unexamined? What personas are missing? What non-goals are implicit?

Capture items in the Items table. Follow the load-bearing heuristic: track formally when the item creates precedent, is customer-facing, is foundational tech, is a one-way door, is cross-cutting, or creates divergence.

Interaction cadence

Follow the session discipline from session-discipline.md:

Present items to user in batches of 3-8 (easy first, hard last)
High-confidence items as stated intentions; medium-confidence as options; items needing user vision flagged explicitly
After each interaction round, include the progress scorecard
When the user answers multiple items in one message, parse each answer, route to the correct item, update status, log to changelog, and confirm

The 7 completeness criteria

Work through the 7 completeness criteria in whatever order the input demands, spending effort where the input is weakest. The user may redirect at any time. Check scope coherence (the 2-3 sentence test) as soon as the problem statement takes shape — don't wait until all criteria are done to discover the story is actually 3 stories.

The criteria fall into two conceptual groupings — user story aspects (problem, value, connections) and technical implications (invariants, non-goals, AC, assumptions). Default to user story first, but follow the energy when input demands it. This is organizational guidance, not a hard sequential gate.

#	Criterion	What to do	Quality gate
1	Problem clarity	Draft SCR at story level (Situation → Complication → Resolution). Run the 5-probe stress test: demand reality, status quo, narrowest wedge, observation, future-fit. Before accepting this framing, check: is this a problem or a solution-in-disguise? "Add webhook support" is a solution. "Enable deployment event visibility" is the problem. Which framing gives the downstream specification process more room to find the right solution? When input is rich: verify the framing holds at this granularity. When bare: elicit through conversation.	Problem is real (not hypothetical), correctly scoped (not 3 stories bundled), and worth doing (cost of inaction is concrete).
2	Dimensional value and goals	Run the dimension-trace diagnostic: does this trace to at least one value dimension? Probe across dimensions — customer, platform, GTM, internal. Ensure intersection reasoning is present, not just dimension labels. Define observable success criteria.	An engineer reading the value section can articulate the tradeoff space and make informed decisions when dimensions conflict. Success criteria are observable.
3	Invariants and constraints	Extract what MUST be true. Push every invariant to be falsifiable. Check claims against the codebase when possible. Extract what bounds the solution space — technical limitations, dependencies, appetite.	Every invariant has an observable definition. No subjective language without concrete criteria. Every constraint identifies what it bounds and why.
4	Non-goals and boundaries	Actively probe: "You said CLI — does that include Windows? Interactive or headless only? Plugin support?" Each answer becomes an invariant, non-goal, or constraint. Tag every non-goal: NEVER / NOT NOW / NOT UNLESS.	Every non-goal has a temporal tag with rationale and (for NOT NOW / NOT UNLESS) a revisit trigger or condition. The section is actively probed, not passively collected.
5	Acceptance criteria	Derive observable, testable outcomes from invariants, goals, and non-goals. Every invariant maps to at least one AC. AC describe outcomes, not implementation.	An engineer could write tests from the AC without guessing intent.
6	Connections and context	Capture pointers: what bet/project this traces to, what siblings share dependencies, what future work this enables. Not deep analysis — enough for a downstream specification process to understand blast radius.	Downstream consumers can see the blast radius of their design decisions without re-discovering connections.
7	Assumptions	Surface what we're treating as true but haven't verified. Each assumption gets: the claim, confidence (HIGH/MEDIUM/LOW), and a verification plan. These become the specification process's investigation agenda. Track assumptions as Items with `Assumed` status in the Items table.	The specification process checks assumptions early rather than building on them blindly.

Investigating gaps (don't accept "I don't know")

When the user can't provide information for a criterion — investigate before accepting the gap. Follow the investigation escalation ladder from session-discipline.md:

Check the codebase (existing patterns, related code, project docs, README, CLAUDE.md).
Check existing reports (the catalogue may have relevant prior research).
Web search for ecosystem context.
If a substantial gap remains after this investigation — one that affects problem framing (criterion #1) or dimensional value (criterion #2) and cannot be resolved from codebase or web evidence alone — dispatch /analyze or /research as a subagent (Pattern C). For /analyze: include any worldmodel output in the prompt and tell it to skip its own worldmodel phase — subagents can't nest further subagents. For /research: include --headless in the prompt (research's scoping gate needs auto-confirmation since no human is present in the subagent).
Only after investigation: if the gap genuinely can't be filled, note it as an assumption with LOW confidence and a specific verification plan. Add to Items table with Assumed status.

If /analyze or /research are unavailable: Skip the dispatch. Note: "deep investigation not performed — gap flagged as LOW confidence assumption with verification plan."

Provenance marking

When you fill a gap through autonomous investigation rather than user input, mark the fill with its source:

"Inferred from codebase patterns in [file] — verify with product owner"
"Inferred from [report name] — verify applicability to this story"
"Inferred from web search [topic] — verify with domain expert"

This lets downstream consumers distinguish user-provided context (high trust) from agent-inferred context (needs verification).

Provenance marking and evidence references are complementary. Provenance marks WHO inferred the content and flags it for verification. Evidence references ((evidence/<filename>.md)) point to WHERE the proof is. When an inference is captured in an evidence file, include both: the provenance marking in the text and the evidence reference after the claim. Example: "Platform value: this establishes the API pattern for marketplace integrations. [Inferred from existing read-only endpoints in api/v2/ — verify with product owner.] (evidence/api-patterns.md)"

Scope coherence check

If the "2-3 sentence test" fails — you can't describe the story in 2-3 sentences — the input contains multiple stories bundled. This skill produces ONE seed per session regardless of input shape:

Trivially separable (you can name 2-3 distinct concerns in one sentence each, no intertwined dependencies): pick the most important one — highest user value, hardest to defer, or what the user came in to refine — and continue the session refining only that one. Surface the others in conversation as candidates for separate /stories runs: "I'm seeing 2-3 stories in this input. I'll refine [chosen one] here. The others — [list] — would each be a separate /stories invocation. Want to handle one of those next?" Never scaffold sibling STORY.md files in this session.
Intertwined (shared dependencies, decomposition requires understanding the dependency graph, more than 3 stories): flag for a project decomposition process. Don't attempt to decompose.
Ambiguous: ask the user before continuing: "I'm seeing 2-3 possible stories here. Are they independent or intertwined?"

Phase 5: Validate and finalize

Resolution completeness gate

Before finalizing, verify:

Every P0 item in the Items table is resolved (Decided, Parked with context, or Assumed with confidence + verification plan)
If P0 items remain Open or Exploring, return to Phase 4 to resolve them
Every Assumed item has a confidence level AND a verification plan
All 7 completeness criteria are populated or explicitly marked N/A with a reason
Every Decided or Assumed item that was resolved through investigation has an evidence reference in Notes

If any criterion is missing without an N/A reason, return to Phase 4 to address it.

Implementer's veto

Simulate — can an engineer take this seed to a specification process without re-deriving the problem framing? If they'd need to ask "what problem does this solve?", "what's out of scope?", or "why does this matter beyond the obvious dimension?" — the seed isn't done.

Finalize STORY.md

Writing is lighter because most content was written progressively during Phase 4. Review STORY.md for completeness, coherence, and consistency. Fill any remaining gaps. Log completion in meta/_changelog.md.

STORY.md → /spec handoff translation

When a downstream specification process reads this STORY.md, it maps Items by status:

Decided → Decision Log entries (the spec's separate-table model)
Parked → Future Work entries with context
Assumed → Assumptions table entries (extract confidence + verification from Notes)
Open/Exploring → Open Questions (should be rare — this skill should resolve before handoff)

Present the completed seed to the user for review. The seed is the deliverable — not the conversation.

No headless mode. This skill requires interactive human input (probing for invariants, non-goals, dimensional gaps). Defer headless support to a future version if orchestrator invocation is needed.

Output template

# Story: [verb-first title]

**Last verified:** YYYY-MM-DD  <!-- date this seed's content was last verified as current -->

## Problem (SCR-lite)
**Situation:** [what exists today]
**Complication:** [why this matters — intersection of dimensions, not just one]
**Resolution:** [what must change]

## Value and goals
[Multi-dimensional articulation — which dimensions apply and how they intersect.
Prose that connects them, not a bullet list of individual dimensions.
Observable success criteria: what "done" looks like across dimensions.]

## Invariants
- [Each is falsifiable with an observable definition]

## Constraints
- [Each identifies what it bounds and why — technical, dependency, appetite]

## Non-goals
- [NEVER] [item + why it's fundamentally out]
- [NOT NOW] [item + revisit trigger]
- [NOT UNLESS] [item + condition that changes the calculus]

## Acceptance criteria
- [Observable, testable outcomes — not implementation prescriptions]

## Items

| ID | Item | Type | Priority | Status | Notes |
|---|---|---|---|---|---|
| PQ1 | ... | Product | P0 | Decided | Decision + rationale (evidence/auth-patterns.md) |
| TQ1 | ... | Technical | P0 | Assumed | Claim. Confidence: Medium. Verify by: [plan] (evidence/api-surface.md) |
| XQ1 | ... | Cross-cutting | P2 | Parked | Options + why not now + trigger |

## Context
- Traces to: [bet/project if known]
- Lateral: [sibling stories that depend on or share with this]
- Forward: [future work this enables]

## Evidence & References

### Evidence Files
- [evidence/<file>.md](evidence/<file>.md) — [one-line: what it contains]

### Research Reports
- [reports/<name>/REPORT.md](reports/<name>/REPORT.md) — [what it covers]

### Code Repositories
- [org/repo](URL) — [what was examined]

### External Sources
- [Title](URL) — [brief description]

### Upstream Artifacts
- [<PROJECT.md path>](<path>) — source project

Anti-patterns

Anti-pattern	What it looks like	Correction
Separate tracking tables	Creating separate Open Questions and Decision Log and Assumptions tables instead of using the unified Items table	One Items table. Status column distinguishes item types. Assumptions are Items with `Assumed` status — confidence and verification plan go in the Notes column.
Accepting claims without verification	User says "the auth layer supports this" → agent proceeds without checking	Check the codebase. The cheapest time to discover a false assumption is now.
Accepting "I don't know" without investigation	User can't provide dimensional value → agent flags as assumption immediately	Investigate first: codebase, reports, web. Only flag as assumption after investigation fails.
Vague invariants	"It should be fast" / "Auth must be transparent"	Push for falsifiable definitions: "Sub-100ms p95 latency" / "Developer never encounters an auth prompt during agent execution."
Dimension lists without intersection reasoning	"Customer: query traces. Platform: API patterns. GTM: none."	Connect them: "Trace querying (customer) AND the API pattern it establishes (platform) — the pattern is load-bearing because the marketplace story needs it."
Non-goals without temporal tags	"No Windows support"	Tag it: "[NOT NOW] No Windows CLI — 95% macOS/Linux. Revisit if: Windows devs exceed 20% of active users."
Implementation prescriptions as acceptance criteria	"Use OpenTelemetry SDK for trace querying"	Rewrite as observable outcome: "Developer can retrieve traces for a specific agent run within 5 minutes of completion."
Bureaucratic interrogation	15 questions before doing any work	Investigate autonomously first. Only surface questions that require human judgment. When you do need to ask, group related questions in a single turn rather than asking one at a time.
Attempting technical design	Proposing API shapes, data models, or architecture	Stop. The seed captures WHAT and WHY. The specification process investigates HOW.
Silently accepting scope incoherence	Story hides 4-5 features but the skill proceeds	Run the 2-3 sentence test. If it fails: pick the most important story to refine here and surface the others as candidates for separate /stories runs (trivially separable), or flag for a project decomposition (intertwined). Never scaffold sibling STORY.md files in this session.
Deferring all writing to the end	60 minutes of conversation, then "let me write the seed"	Event-driven writing from Phase 3 onward. Evidence files written immediately. STORY.md sections updated after user confirms synthesis.
Items table bloat	40+ items where most are implementation details	Apply the load-bearing heuristic: track formally only when the item creates precedent, is customer-facing, is foundational tech, is a one-way door, is cross-cutting, or creates divergence.

Related skills

More from inkeep/team-skills

Installs

Repository

inkeep/team-skills

GitHub Stars

First Seen

Apr 14, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn