stories
Stories
Sharpen a single work item into a story seed — a structured artifact precise enough that a downstream specification process can start investigating the solution without re-deriving the problem.
The seed captures WHAT and WHY at the product level. It does NOT make technical architecture decisions, design APIs, choose data models, or specify implementation approaches — those belong to the specification process.
Your stance
- You are a proactive co-driver — not a passive collector. You have opinions, push for precision, and challenge assumptions rather than accepting them.
- The user is the decision-maker and vision-holder. Create space for their domain knowledge — customer conversations, product vision, strategic context.
- You enforce rigor: verify claims against the codebase, probe for unstated dimensions, challenge vague invariants, surface implicit non-goals. This is your job even when the user doesn't ask.
- Before asking the user anything, check whether the answer is findable through investigation. Only surface questions that genuinely require human judgment or domain knowledge that exists only in their head.
Load (on entry): Load /structured-thinking skill. If the skill is not available (Skill tool returns an error), stop and inform the user: "The /stories skill requires /structured-thinking for shared vocabulary (SCR format, disambiguation protocol, value dimensions, decision taxonomy). Cannot proceed without it."
After loading, find the skill's reference files (use Glob for **/structured-thinking/references/*.md). Read:
references/challenge-posture.md(co-driver stance, anti-sycophancy rules, investigate-vs-judgment boundary)references/extraction-protocol.md(three probes, Items table schema + lifecycle, carry-forward discipline)references/session-discipline.md(investigation escalation ladder, multi-answer parsing, progress scorecard, interaction cadence)
What this skill does NOT do
- Technical architecture decisions — no data models, API shapes, auth designs, system design, or failure mode analysis. The seed captures boundaries; the specification process investigates solutions.
- Decompose a bet into multiple stories — that's a project decomposition concern. This skill takes ONE story and sharpens it.
- Produce implementation-level user stories — no spec.json, no US-NNN format, no code-level acceptance criteria. The seed is a human-readable product artifact.
Workflow
Create workflow tasks (first action)
Before starting any work, create a task for each phase using TaskCreate with addBlockedBy to enforce ordering.
- Stories: Assess input and route
- Stories: Scaffold — create living document infrastructure
- Stories: Ground (if bare input)
- Stories: Sharpen — work through completeness criteria with living document tracking
- Stories: Validate and finalize
Mark each task in_progress when starting and completed when done. On re-entry, check TaskList first and resume from the first non-completed task.
If input is rich (from a project decomposition output with dimensional value and constraints), mark task #3 as deleted — grounding is not needed.
Phase 1: Assess input and route
Determine what the user brought and how to proceed.
Input quality heuristic: Input is "rich" when it includes multi-dimensional value articulation and explicit constraints (typically from a project decomposition output or a detailed brief). All other input is "bare."
Routing check — detect wrong-tool scenarios:
- The user describes a strategic question about which bets to pursue → suggest a portfolio-level strategic skill instead. This skill sharpens one story, not portfolio decisions.
- The user describes a bet that needs decomposition into multiple stories → suggest a project decomposition skill instead. This skill takes one story, not a bet.
- The user is already asking technical architecture questions ("should we use REST or GraphQL?") → suggest a specification skill instead. This skill captures the problem and boundaries, not the solution.
If input comes from a PROJECT.md: Read the upstream Items table. Respect Decided (Locked) items as constraints — do not reopen without new evidence. Carry Assumed items into this story's Items table for early verification. Note Parked items for context. See extraction-protocol.md §5 (carry-forward discipline).
If the input is a story: proceed. Determine bare vs rich and move to Phase 2 (Scaffold) → Phase 3 or Phase 4.
Phase 2: Scaffold
Create the living document infrastructure before any substantive work. This ensures event-driven writing has a home from the first finding.
- Create the story directory:
<stories-dir>/<story-name>/ - Create
STORY.mdwith section headers from the output template (empty — populated progressively through Phases 3-4) - Create
evidence/directory - Create
meta/_changelog.mdwith initial entry: date, story description, input source (upstream PROJECT.md reference if applicable)
Single-artifact invariant. This skill produces exactly ONE STORY.md per session. Sibling STORY.md files are never created. If the input contains multiple stories, refine the most important one here and surface the others as candidates for separate /stories runs.
If carrying forward items from an upstream PROJECT.md, populate the Items table with carried items (Decided → constraint context in Notes, Assumed → verify early, Parked → awareness).
Where to save:
| Priority | Source |
|---|---|
| 1 | User says so in the current session |
| 2 | Env var CLAUDE_STORIES_DIR (check for resolved-stories-dir in the SessionStart hook output at the top of your conversation context; if not present, the hook may not be configured — fall back to priority 3-5) |
| 3 | AI repo config (CLAUDE.md, AGENTS.md, etc.) declares stories-dir: |
| 4 | Default (in a repo): <repo-root>/stories/<story-name>/STORY.md |
| 5 | Default (no repo): ~/.claude/stories/<story-name>/STORY.md |
The directory name uses kebab-case semantic naming (e.g., stories/add-cli-agent-runner/).
Phase 3: Ground (bare input only)
When the user arrives with a bare idea (no project context, no structured upstream), build grounding before sharpening.
Dispatch /worldmodel --depth light as a subagent: spawn a general-purpose subagent via the Agent tool. Include --depth light in the subagent's prompt text:
"Before doing anything, load /worldmodel skill. Run with --depth light on [topic]. [Include the story description and any user-provided links or context.]"
Light mode runs all channels at reduced depth — inline code scanning, 2 web probes, report catalogue scan, OSS README. This surfaces: what currently exists in the codebase, the connection landscape, 3P context, and dimensional awareness.
Read the worldmodel output. Use it to probe the user with informed questions rather than blank interrogation:
- "Based on the codebase, this connects to [X]. Is that the broader context?"
- "The existing reports suggest [Y]. Does that match your understanding?"
- "I found [Z] adjacent work. What depends on this?"
Write findings to evidence/ immediately. Grounding findings are facts — they don't need user validation to be captured. Use frontmatter to distinguish raw proof from synthesized understanding (see extraction-protocol.md §8).
If /worldmodel is unavailable: Fall back to direct investigation — Read/Grep/Glob for codebase scanning, WebSearch for web context, read the reports catalogue manually. Note in the seed: "automated grounding not performed — manual investigation used."
Proceed to Phase 4 once the problem space is sufficiently explored — the 5-probe stress test (Phase 4, criterion #1) is the exit gate for grounding.
Phase 4: Sharpen — work through completeness criteria with living document tracking
Read these reference files from /structured-thinking (already loaded on entry):
references/disambiguation-protocol.md— the 5-step protocol (challenge/probe/surface/explore/verify) applied throughoutreferences/problem-framing.md— SCR format and 5-probe stress test for criterion #1references/value-dimensions.md— dimension-trace diagnostic and intersection reasoning for criterion #2references/decision-taxonomy.md— temporal non-goals and confidence vocabulary for criteria #3-#4 and #7
Read references/quality-examples.md from this skill's directory for incorrect/correct pairs on the highest-risk criteria. Use these to calibrate your quality enforcement before working through the criteria.
Event-driven writing
From this phase onward, write to artifacts as items surface and resolve — not at the end.
- Investigation produces findings → write to evidence/ immediately. Facts don't need user validation.
- Load-bearing content gate → present to user, do not write to STORY.md. If agent-inferred content hits any load-bearing criterion (creates precedent, customer-facing, foundational tech, one-way door, cross-cutting, creates divergence) or requires human judgment (product vision, priority, risk appetite, scope), present it in conversation with supporting evidence. Write to STORY.md only after the user explicitly confirms. Agent conclusions with product or architectural consequences are synthesis, not evidence — regardless of confidence.
- User confirms a decision → update Items table (status → Decided, add firmness + rationale in Notes). Update the relevant STORY.md section.
- Decision changes → cascade analysis. Trace dependents: which other items or sections does this decision affect? Update affected entries. Log the cascade in
meta/_changelog.md. - Completed criterion → write to STORY.md section. After user confirmation on synthesis (problem statement, value articulation, invariants), update the corresponding section.
Systematic extraction
Apply the three probes from extraction-protocol.md at story level as items surface during sharpening:
- Walk through the problem statement, each value dimension, each constraint, each AC — what's uncertain? Assumed but unverified?
- Where do goals conflict with constraints? Where does user need diverge from technical feasibility?
- What failure modes are unexamined? What personas are missing? What non-goals are implicit?
Capture items in the Items table. Follow the load-bearing heuristic: track formally when the item creates precedent, is customer-facing, is foundational tech, is a one-way door, is cross-cutting, or creates divergence.
Interaction cadence
Follow the session discipline from session-discipline.md:
- Present items to user in batches of 3-8 (easy first, hard last)
- High-confidence items as stated intentions; medium-confidence as options; items needing user vision flagged explicitly
- After each interaction round, include the progress scorecard
- When the user answers multiple items in one message, parse each answer, route to the correct item, update status, log to changelog, and confirm
The 7 completeness criteria
Work through the 7 completeness criteria in whatever order the input demands, spending effort where the input is weakest. The user may redirect at any time. Check scope coherence (the 2-3 sentence test) as soon as the problem statement takes shape — don't wait until all criteria are done to discover the story is actually 3 stories.
The criteria fall into two conceptual groupings — user story aspects (problem, value, connections) and technical implications (invariants, non-goals, AC, assumptions). Default to user story first, but follow the energy when input demands it. This is organizational guidance, not a hard sequential gate.
| # | Criterion | What to do | Quality gate |
|---|---|---|---|
| 1 | Problem clarity | Draft SCR at story level (Situation → Complication → Resolution). Run the 5-probe stress test: demand reality, status quo, narrowest wedge, observation, future-fit. Before accepting this framing, check: is this a problem or a solution-in-disguise? "Add webhook support" is a solution. "Enable deployment event visibility" is the problem. Which framing gives the downstream specification process more room to find the right solution? When input is rich: verify the framing holds at this granularity. When bare: elicit through conversation. | Problem is real (not hypothetical), correctly scoped (not 3 stories bundled), and worth doing (cost of inaction is concrete). |
| 2 | Dimensional value and goals | Run the dimension-trace diagnostic: does this trace to at least one value dimension? Probe across dimensions — customer, platform, GTM, internal. Ensure intersection reasoning is present, not just dimension labels. Define observable success criteria. | An engineer reading the value section can articulate the tradeoff space and make informed decisions when dimensions conflict. Success criteria are observable. |
| 3 | Invariants and constraints | Extract what MUST be true. Push every invariant to be falsifiable. Check claims against the codebase when possible. Extract what bounds the solution space — technical limitations, dependencies, appetite. | Every invariant has an observable definition. No subjective language without concrete criteria. Every constraint identifies what it bounds and why. |
| 4 | Non-goals and boundaries | Actively probe: "You said CLI — does that include Windows? Interactive or headless only? Plugin support?" Each answer becomes an invariant, non-goal, or constraint. Tag every non-goal: NEVER / NOT NOW / NOT UNLESS. | Every non-goal has a temporal tag with rationale and (for NOT NOW / NOT UNLESS) a revisit trigger or condition. The section is actively probed, not passively collected. |
| 5 | Acceptance criteria | Derive observable, testable outcomes from invariants, goals, and non-goals. Every invariant maps to at least one AC. AC describe outcomes, not implementation. | An engineer could write tests from the AC without guessing intent. |
| 6 | Connections and context | Capture pointers: what bet/project this traces to, what siblings share dependencies, what future work this enables. Not deep analysis — enough for a downstream specification process to understand blast radius. | Downstream consumers can see the blast radius of their design decisions without re-discovering connections. |
| 7 | Assumptions | Surface what we're treating as true but haven't verified. Each assumption gets: the claim, confidence (HIGH/MEDIUM/LOW), and a verification plan. These become the specification process's investigation agenda. Track assumptions as Items with Assumed status in the Items table. |
The specification process checks assumptions early rather than building on them blindly. |
Investigating gaps (don't accept "I don't know")
When the user can't provide information for a criterion — investigate before accepting the gap. Follow the investigation escalation ladder from session-discipline.md:
- Check the codebase (existing patterns, related code, project docs, README, CLAUDE.md).
- Check existing reports (the catalogue may have relevant prior research).
- Web search for ecosystem context.
- If a substantial gap remains after this investigation — one that affects problem framing (criterion #1) or dimensional value (criterion #2) and cannot be resolved from codebase or web evidence alone — dispatch
/analyzeor/researchas a subagent (Pattern C). For/analyze: include any worldmodel output in the prompt and tell it to skip its own worldmodel phase — subagents can't nest further subagents. For/research: include--headlessin the prompt (research's scoping gate needs auto-confirmation since no human is present in the subagent). - Only after investigation: if the gap genuinely can't be filled, note it as an assumption with LOW confidence and a specific verification plan. Add to Items table with
Assumedstatus.
If /analyze or /research are unavailable: Skip the dispatch. Note: "deep investigation not performed — gap flagged as LOW confidence assumption with verification plan."
Provenance marking
When you fill a gap through autonomous investigation rather than user input, mark the fill with its source:
- "Inferred from codebase patterns in [file] — verify with product owner"
- "Inferred from [report name] — verify applicability to this story"
- "Inferred from web search [topic] — verify with domain expert"
This lets downstream consumers distinguish user-provided context (high trust) from agent-inferred context (needs verification).
Provenance marking and evidence references are complementary. Provenance marks WHO inferred the content and flags it for verification. Evidence references ((evidence/<filename>.md)) point to WHERE the proof is. When an inference is captured in an evidence file, include both: the provenance marking in the text and the evidence reference after the claim. Example: "Platform value: this establishes the API pattern for marketplace integrations. [Inferred from existing read-only endpoints in api/v2/ — verify with product owner.] (evidence/api-patterns.md)"
Scope coherence check
If the "2-3 sentence test" fails — you can't describe the story in 2-3 sentences — the input contains multiple stories bundled. This skill produces ONE seed per session regardless of input shape:
- Trivially separable (you can name 2-3 distinct concerns in one sentence each, no intertwined dependencies): pick the most important one — highest user value, hardest to defer, or what the user came in to refine — and continue the session refining only that one. Surface the others in conversation as candidates for separate /stories runs: "I'm seeing 2-3 stories in this input. I'll refine [chosen one] here. The others — [list] — would each be a separate /stories invocation. Want to handle one of those next?" Never scaffold sibling STORY.md files in this session.
- Intertwined (shared dependencies, decomposition requires understanding the dependency graph, more than 3 stories): flag for a project decomposition process. Don't attempt to decompose.
- Ambiguous: ask the user before continuing: "I'm seeing 2-3 possible stories here. Are they independent or intertwined?"
Phase 5: Validate and finalize
Resolution completeness gate
Before finalizing, verify:
- Every P0 item in the Items table is resolved (Decided, Parked with context, or Assumed with confidence + verification plan)
- If P0 items remain Open or Exploring, return to Phase 4 to resolve them
- Every Assumed item has a confidence level AND a verification plan
- All 7 completeness criteria are populated or explicitly marked N/A with a reason
- Every Decided or Assumed item that was resolved through investigation has an evidence reference in Notes
If any criterion is missing without an N/A reason, return to Phase 4 to address it.
Implementer's veto
Simulate — can an engineer take this seed to a specification process without re-deriving the problem framing? If they'd need to ask "what problem does this solve?", "what's out of scope?", or "why does this matter beyond the obvious dimension?" — the seed isn't done.
Finalize STORY.md
Writing is lighter because most content was written progressively during Phase 4. Review STORY.md for completeness, coherence, and consistency. Fill any remaining gaps. Log completion in meta/_changelog.md.
STORY.md → /spec handoff translation
When a downstream specification process reads this STORY.md, it maps Items by status:
Decided→ Decision Log entries (the spec's separate-table model)Parked→ Future Work entries with contextAssumed→ Assumptions table entries (extract confidence + verification from Notes)Open/Exploring→ Open Questions (should be rare — this skill should resolve before handoff)
Present the completed seed to the user for review. The seed is the deliverable — not the conversation.
No headless mode. This skill requires interactive human input (probing for invariants, non-goals, dimensional gaps). Defer headless support to a future version if orchestrator invocation is needed.
Output template
# Story: [verb-first title]
**Last verified:** YYYY-MM-DD <!-- date this seed's content was last verified as current -->
## Problem (SCR-lite)
**Situation:** [what exists today]
**Complication:** [why this matters — intersection of dimensions, not just one]
**Resolution:** [what must change]
## Value and goals
[Multi-dimensional articulation — which dimensions apply and how they intersect.
Prose that connects them, not a bullet list of individual dimensions.
Observable success criteria: what "done" looks like across dimensions.]
## Invariants
- [Each is falsifiable with an observable definition]
## Constraints
- [Each identifies what it bounds and why — technical, dependency, appetite]
## Non-goals
- [NEVER] [item + why it's fundamentally out]
- [NOT NOW] [item + revisit trigger]
- [NOT UNLESS] [item + condition that changes the calculus]
## Acceptance criteria
- [Observable, testable outcomes — not implementation prescriptions]
## Items
| ID | Item | Type | Priority | Status | Notes |
|---|---|---|---|---|---|
| PQ1 | ... | Product | P0 | Decided | Decision + rationale (evidence/auth-patterns.md) |
| TQ1 | ... | Technical | P0 | Assumed | Claim. Confidence: Medium. Verify by: [plan] (evidence/api-surface.md) |
| XQ1 | ... | Cross-cutting | P2 | Parked | Options + why not now + trigger |
## Context
- Traces to: [bet/project if known]
- Lateral: [sibling stories that depend on or share with this]
- Forward: [future work this enables]
## Evidence & References
### Evidence Files
- [evidence/<file>.md](evidence/<file>.md) — [one-line: what it contains]
### Research Reports
- [reports/<name>/REPORT.md](reports/<name>/REPORT.md) — [what it covers]
### Code Repositories
- [org/repo](URL) — [what was examined]
### External Sources
- [Title](URL) — [brief description]
### Upstream Artifacts
- [<PROJECT.md path>](<path>) — source project
Anti-patterns
| Anti-pattern | What it looks like | Correction |
|---|---|---|
| Separate tracking tables | Creating separate Open Questions and Decision Log and Assumptions tables instead of using the unified Items table | One Items table. Status column distinguishes item types. Assumptions are Items with Assumed status — confidence and verification plan go in the Notes column. |
| Accepting claims without verification | User says "the auth layer supports this" → agent proceeds without checking | Check the codebase. The cheapest time to discover a false assumption is now. |
| Accepting "I don't know" without investigation | User can't provide dimensional value → agent flags as assumption immediately | Investigate first: codebase, reports, web. Only flag as assumption after investigation fails. |
| Vague invariants | "It should be fast" / "Auth must be transparent" | Push for falsifiable definitions: "Sub-100ms p95 latency" / "Developer never encounters an auth prompt during agent execution." |
| Dimension lists without intersection reasoning | "Customer: query traces. Platform: API patterns. GTM: none." | Connect them: "Trace querying (customer) AND the API pattern it establishes (platform) — the pattern is load-bearing because the marketplace story needs it." |
| Non-goals without temporal tags | "No Windows support" | Tag it: "[NOT NOW] No Windows CLI — 95% macOS/Linux. Revisit if: Windows devs exceed 20% of active users." |
| Implementation prescriptions as acceptance criteria | "Use OpenTelemetry SDK for trace querying" | Rewrite as observable outcome: "Developer can retrieve traces for a specific agent run within 5 minutes of completion." |
| Bureaucratic interrogation | 15 questions before doing any work | Investigate autonomously first. Only surface questions that require human judgment. When you do need to ask, group related questions in a single turn rather than asking one at a time. |
| Attempting technical design | Proposing API shapes, data models, or architecture | Stop. The seed captures WHAT and WHY. The specification process investigates HOW. |
| Silently accepting scope incoherence | Story hides 4-5 features but the skill proceeds | Run the 2-3 sentence test. If it fails: pick the most important story to refine here and surface the others as candidates for separate /stories runs (trivially separable), or flag for a project decomposition (intertwined). Never scaffold sibling STORY.md files in this session. |
| Deferring all writing to the end | 60 minutes of conversation, then "let me write the seed" | Event-driven writing from Phase 3 onward. Evidence files written immediately. STORY.md sections updated after user confirms synthesis. |
| Items table bloat | 40+ items where most are implementation details | Apply the load-bearing heuristic: track formally only when the item creates precedent, is customer-facing, is foundational tech, is a one-way door, is cross-cutting, or creates divergence. |
More from inkeep/team-skills
qa
Manual QA testing — verify features end-to-end as a user would, by all means necessary. Exhausts every local tool: browser (Playwright), Docker, ad-hoc scripts, REPL, dev servers. Mock-aware — mocked test coverage does not count. Proves real userOutcome at highest achievable fidelity. Blocked scenarios flow to /pr as pending human verification. Standalone or composable with /ship. Triggers: qa, qa test, manual test, test the feature, verify it works, exploratory testing, smoke test, end-to-end verification.
61cold-email
Generate cold emails for B2B personas. Use when asked to write cold outreach, sales emails, or prospect messaging. Supports 19 persona archetypes (Founder-CEO, CTO, VP Engineering, CIO, CPO, Product Directors, VP CX, Head of Support, Support Ops, DevRel, Head of Docs, Technical Writer, Head of Community, VP Growth, Head of AI, etc.). Can generate first-touch and follow-up emails. When a LinkedIn profile URL is provided, uses Crustdata MCP to enrich prospect data (name, title, company, career history, recent posts) for deep personalization.
54spec
Drive an evidence-driven, iterative product+engineering spec process that produces a full PRD + technical spec (often as SPEC.md). Use when scoping a feature or product surface area end-to-end; defining requirements; researching external/internal prior art; mapping current system behavior; comparing design options; making 1-way-door decisions; negotiating scope; and maintaining a live Decision Log + Open Questions backlog. Triggers: spec, PRD, proposal, technical spec, RFC, scope this, design doc, end-to-end requirements, scope plan, tradeoffs, open questions.
54ship
Orchestrate any code change from requirements to review-ready branch — scope-calibrated from small fixes to full features. Composes /spec, /implement, and /research with depth that scales to the task: lightweight spec and direct implementation for bug fixes and config changes, full rigor for features. Produces tested, locally reviewed, documented code on a feature branch. The developer pushes the branch and creates the PR. Use for ALL implementation work regardless of perceived scope — the workflow adapts depth, never skips phases. Triggers: ship, ship it, feature development, implement end to end, spec to PR, implement this, fix this, let's implement, let's go with that, build this, make the change, full stack implementation, autonomous development.
52docs
Write or update documentation for engineering changes — both product-facing (user docs, API reference, guides) and internal (architecture docs, runbooks, inline code docs). Builds a world model of what changed and traces transitive documentation consequences across all affected surfaces. Discovers and uses repo-specific documentation skills, style guides, and conventions. Standalone or composable with /ship. Triggers: docs, documentation, write docs, update docs, document the changes, product docs, internal docs, changelog, migration guide.
52implement
Convert SPEC.md to spec.json, craft the implementation prompt, and execute the iteration loop via subprocess. Use when converting specs to spec.json, preparing implementation artifacts, running the iteration loop, or implementing features autonomously. Triggers: implement, spec.json, convert spec, implementation prompt, execute implementation, run implementation.
52