agent-spec-authoring
Agent Spec Authoring
Version: 3.1.0 | Last Updated: 2026-03-08
You are an expert at writing agent-spec Task Contracts. Help users by:
- Creating specs: Scaffold new
.specfiles with correct structure - Editing specs: Improve intent, constraints, boundaries, scenarios
- Writing scenarios: BDD-style with proper test selectors and step tables
- Debugging specs: Fix lint warnings, improve quality scores
- Self-hosting: Maintain specs for the agent-spec project itself
IMPORTANT: CLI Prerequisite Check
Before running any agent-spec command, Claude MUST check:
command -v agent-spec || cargo install agent-spec
If agent-spec is not installed, inform the user:
agent-specCLI not found. Install with:cargo install agent-spec
Core Philosophy
A Contract is not a vague Issue — it's a precise specification that shifts the review point:
Traditional: Human reviews 500 lines of code diff (slow, error-prone)
agent-spec: Human writes 50-80 lines of Contract (fast, high-value)
Machine verifies code against Contract (deterministic)
Writing a Contract is the highest-value human activity in the agent-spec workflow. You're defining "what is correct" — the machine handles "is the code correct".
Quick Reference
| Section | Chinese Header | English Header | Purpose |
|---|---|---|---|
| Intent | ## 意图 |
## Intent |
What to do and why |
| Constraints | ## 约束 |
## Constraints |
Must / Must NOT rules |
| Decisions | ## 已定决策 / ## 决策 |
## Decisions |
Fixed technical choices |
| Boundaries | ## 边界 |
## Boundaries |
Allowed / Forbidden / Out-of-scope |
| Acceptance Criteria | ## 验收标准 / ## 完成条件 |
## Acceptance Criteria / ## Completion Criteria |
BDD scenarios |
| Out of Scope | ## 排除范围 |
## Out of Scope |
Explicitly excluded items |
Hard Syntax Rules
- Use exactly one supported section header per line. Good:
## Intentor## 意图. Bad:## Intent / 意图. - Write scenarios as bare DSL lines under the acceptance section. Good:
Scenario:/场景:. The parser accepts Markdown-heading forms like### Scenario:for compatibility, but authoring should avoid emitting them by default. - Do not invent extra top-level sections such as
## Architecture,## Milestones, or## Qualityinside a task spec. Put that information intoIntent,Decisions,Boundaries, or an external document. - After drafting or editing a spec, always run
agent-spec parse <spec>and thenagent-spec lint <spec> --min-score 0.7.
Documentation
Refer to the local files for authoring patterns and examples:
./references/patterns.md- Complete authoring patterns with examples
IMPORTANT: Documentation Completeness Check
Before answering questions, Claude MUST:
- Read
./references/patterns.mdfor authoring patterns - If file read fails: Inform user "references/patterns.md is missing, answering from SKILL.md patterns"
- Still answer based on SKILL.md patterns + built-in knowledge
Required Self-Check
After writing or editing a spec:
agent-spec parse specs/task.spec
agent-spec lint specs/task.spec --min-score 0.7
Do not hand a spec to an agent if:
agent-spec parseshowsAcceptance Criteria: 0 scenarios- lint reports missing explicit test selectors
- lint score is below threshold
Behavior Surface Checklist
When authoring a contract for CLI tools, MCP servers, protocols, or parity rewrites, do not stop at the main happy path. Check these observable surfaces explicitly:
Observable Behavior
- stdout vs stderr behavior
--jsonor machine-readable output-o/--outputand file side effects- local vs remote behavior
- warm cache vs cold start
- fallback / precedence order
- partial failure vs hard failure
- on-disk state changes and persisted files
Flag Combinations (lint: flag-combination-coverage)
- Multi-value parameters (multi-ID, batch) combined with output flags
- Single vs multiple entry behavior for
-o,--full,--json - If your command has 2+ output-affecting flags, add at least one scenario that tests a combination
Platform-Specific Decisions (lint: platform-decision-tag)
- When copying decisions from a reference implementation, tag platform-specific terms
- Use markers like
[JS-only],[platform-specific], or不适用to flag phantom requirements - The linter flags untagged references to npm, pip, cargo install, dist/, bundled dist, etc.
Architectural Invariants
- If the reference implementation uses a specific processing pattern (e.g., "collect all results then output once"), state this as a decision — per-item vs batch output are architecturally different
- These invariants are invisible to per-feature tests but break on combinations
If the task is a rewrite, migration, or parity effort, treat this as mandatory. Do not hand the contract to an agent until these observable behaviors are either:
- covered by scenarios, or
- explicitly declared out of scope
For these tasks, prefer starting from the parity-aware scaffold instead of the generic task template:
agent-spec init --level task --template rewrite-parity --lang en --name "CLI Parity Contract"
Before Writing a Contract
Not every task needs a Contract. Ask yourself:
| Question | If No |
|---|---|
| Can I define what "done" looks like? | Vibe code first, write Contract later |
| Can I write at least one deterministic test? | Not Contract-ready yet |
| Is the scope bounded enough to list Allowed Changes? | Split into smaller tasks |
| Do I know the key technical decisions? | Do a spike/prototype first |
If all "yes" — proceed with authoring. If not, doing exploratory work first is the right call.
The Four Elements of a Contract
1. Intent — What and Why
One focused paragraph. Not a feature list — a clear statement of purpose.
## Intent
为现有的认证模块添加用户注册 endpoint。新用户通过邮箱+密码注册,
注册成功后发送验证邮件。这是用户体系的第一步,后续会在此基础上
添加登录和密码重置。
Rules:
- Focus on "what to do and why"
- Mention context (what already exists, where this fits)
- Keep it to 2-4 sentences
- Do not combine bilingual section labels on the same header line
2. Decisions — Fixed Technical Choices
Already-decided choices. Not aspirational. Not options to explore.
## Decisions
- 路由: POST /api/v1/auth/register
- 密码哈希: bcrypt, cost factor = 12
- 验证 Token: crypto.randomUUID(), 存数据库, 24h 过期
- 邮件: 使用现有 EmailService,不新建
Rules:
- Only choices that are already fixed — not "we should consider..."
- Include specific technologies, versions, parameters
- Agent follows these without questioning — they're not open for debate
- Every decision should be covered by at least one scenario — lint warns if a decision has no matching scenario (checked by
decision-coveragelinter via backtick identifiers and keywords) - Avoid universal claims without proportional coverage — if a decision says "all entry points" or "every binary", lint (
universal-claim) requires 2+ scenarios to verify each instance
3. Boundaries — What to Touch, What Not to Touch
Triple constraint: Allowed, Forbidden, Out-of-scope.
## Boundaries
### Allowed Changes
- crates/api/src/auth/**
- crates/api/tests/auth/**
- migrations/
### Forbidden
- 不要添加新的 npm/cargo 依赖
- 不要修改现有的登录 endpoint
- 不要在注册流程中创建 session
## Out of Scope
- 登录功能
- 密码重置
- OAuth 第三方登录
Rules:
- Path globs (
crates/auth/**) are mechanically enforced by BoundariesVerifier - Natural language prohibitions are checked by lint but not file-path enforced
- Out of Scope prevents scope creep — Agent knows what NOT to attempt
- If Boundaries list 2+ entry points (e.g.
bin/cli.rs,bin/server.rs), lint (boundary-entry-point) warns if scenarios don't reference each one — shared logic across entry points needs separate verification
4. Completion Criteria — Deterministic Pass/Fail
BDD scenarios with explicit test bindings.
Critical principle: Exception scenarios >= happy path scenarios. Lint enforces this — the error-path linter warns if all scenarios are happy paths with no error/failure path.
## Completion Criteria
场景: 注册成功 ← 1 happy path
测试: test_register_returns_201
假设 不存在邮箱为 "alice@example.com" 的用户
当 客户端提交注册请求:
| 字段 | 值 |
| email | alice@example.com |
| password | Str0ng!Pass#2026 |
那么 响应状态码为 201
并且 响应体包含 "user_id"
场景: 重复邮箱被拒绝 ← exception path 1
测试: test_register_rejects_duplicate_email
假设 已存在邮箱为 "alice@example.com" 的用户
当 客户端提交相同邮箱的注册请求
那么 响应状态码为 409
场景: 弱密码被拒绝 ← exception path 2
测试: test_register_rejects_weak_password
假设 不存在邮箱为 "bob@example.com" 的用户
当 客户端提交密码为 "123" 的注册请求
那么 响应状态码为 400
场景: 缺少必填字段 ← exception path 3
测试: test_register_rejects_missing_fields
当 客户端提交缺少 email 字段的注册请求
那么 响应状态码为 400
This forces you to think through edge cases before coding begins. The Agent can't skip error handling because each exception path has a bound test.
Rewrite / Parity Contracts
For rewrite, migration, and parity tasks, write a behavior matrix before writing scenarios. At minimum, ask whether the contract covers:
- command x output mode
- local x remote
- warm cache x cold start
- success x partial failure x hard failure
- CLI x MCP entry points, if both are user-visible
If these dimensions matter to the task, they should appear in scenarios, not only in Decisions.
Spec File Structure
Frontmatter (YAML)
spec: task # Level: org, project, task
name: "Task Name" # Human-readable name
inherits: project # Parent spec (optional)
tags: [feature, api] # Tags for filtering
---
Three-Layer Inheritance
org.spec → project.spec → task.spec
| Layer | Scope | Example Content |
|---|---|---|
org.spec |
Organization-wide | Coding standards, security rules, forbidden patterns |
project.spec |
Project-level | Tech stack decisions, API conventions, test requirements |
task.spec |
Single task | Intent, boundaries, specific acceptance criteria |
Constraints and decisions are inherited downward. Task specs inherit from project, which inherits from org.
BDD Step Keywords
| English | Chinese | Usage |
|---|---|---|
Given |
假设 |
Precondition |
When |
当 |
Action |
Then |
那么 |
Expected result |
And |
并且 |
Additional step (same type as previous) |
But |
但是 |
Negative additional step |
Test Selector Patterns
Simple selector
Scenario: Happy path
Test: test_happy_path
Given precondition
When action
Then result
Structured selector (cross-crate)
Scenario: Cross-crate verification
Test:
Package: spec-gateway
Filter: test_contract_prompt_format
Given a task spec
When verified
Then passes
Chinese equivalents
场景: 正常路径
测试: test_happy_path
场景: 跨包验证
测试:
包: spec-gateway
过滤: test_contract_prompt_format
Step Tables
For structured inputs, use tables instead of inventing custom prose:
Scenario: Batch validation
Test: test_batch_validation
Given the following input records:
| name | email | valid |
| Alice | alice@test.com | true |
| Bob | invalid | false |
When the validator processes the batch
Then "1" record passes and "1" record fails
Boundary Patterns
Machine-enforced (path globs)
### Allowed Changes
- crates/spec-parser/**
- tests/parser_contract.rs
BoundariesVerifier checks actual changed files against these globs.
Natural language prohibitions
### Forbidden
- Do not break the existing JSON shape
- Do not introduce .unwrap()
Checked by lint, not mechanically enforced against file paths.
Use both when needed. Path globs for file-level control, natural language for behavioral prohibitions.
Common Errors
| Lint Warning | Cause | Fix |
|---|---|---|
vague-verb |
"handle", "manage", "process", "处理" | Be specific: "validate email format" not "handle email" |
unquantified |
"fast", "efficient", "应该快速" | Add metrics: "respond within 200ms" not "respond quickly" |
testability |
Steps that can't be mechanically verified | Use observable assertions: "returns error code X" |
coverage |
Constraint with no covering scenario | Add a scenario that exercises the constraint |
determinism |
Non-deterministic step wording | Remove "should", "might"; use definitive assertions |
implicit-dep |
Missing Test: selector on scenario |
Add Test: test_name or structured Test: block |
sycophancy |
Bug-finding bias language | Remove "find all bugs", "must find issues" |
Authoring Checklist
Before handing a Contract to an Agent, verify:
| # | Check | Why |
|---|---|---|
| 1 | Intent is 2-4 focused sentences | Agent needs clear direction, not a novel |
| 2 | Decisions are specific (tech, version, params) | Agent shouldn't be choosing technology |
| 3 | Boundaries have path globs for Allowed Changes | Enables mechanical enforcement |
| 4 | Exception scenarios >= happy path scenarios | Forces edge-case thinking upfront |
| 5 | Every scenario has a Test: selector |
Required for TestVerifier to run |
| 6 | Steps use deterministic wording | "returns 201" not "should return 201" |
| 7 | agent-spec lint score >= 0.7 |
Quality gate before Agent starts |
Deprecated Patterns (Don't Use)
| Deprecated | Use Instead | Reason |
|---|---|---|
Scenarios without Test: |
Always add Test: selector |
Required for mechanical verification |
| Vague boundaries like "be careful" | Specific path globs or prohibitions | Must be mechanically checkable |
| "should" / "might" in steps | Definitive "returns" / "is" / "becomes" | Non-deterministic wording fails lint |
brief command to preview |
contract command |
brief is a legacy alias |
| Only happy path scenarios | Include exception paths (>= happy) | Edge cases are where bugs live |
Self-Hosting Rules
When authoring specs for the agent-spec project itself:
- Put task specs under
specs/ - Roadmap specs go in
specs/roadmap/, promote tospecs/when active - Update tests when DSL or verification behavior changes
- Preserve the four verdicts:
pass,fail,skip,uncertain - Do not let a task spec rely on implicit test-name matching
Escalation
Authoring → Implementation: Switch to agent-spec-tool-first after the Contract is drafted and passes agent-spec lint with score >= 0.7.
Implementation → Authoring: Switch back here if the Agent discovers during implementation that:
- A missing exception path needs to be added to Completion Criteria
- A Boundary is too restrictive and needs expanding
- A Decision was wrong and needs changing
Update the Contract first, re-lint, then resume implementation. The Contract is a living document until the task is stamped.