skill-writing
Skill Writing
Skills are bundled instruction modules that load into an agent's context when activated. They are the highest-leverage mechanism for controlling agent behavior because they deliver focused instructions at the moment of relevance.
The Iron Law
No skill without a failing test first. Before writing a skill, run a subagent on the target task WITHOUT the skill. Document what it gets wrong — the exact rationalizations, shortcuts, and skipped steps. Then write the skill addressing those specific failures. This is TDD applied to documentation.
| TDD Step | Skill Equivalent |
|---|---|
| RED | Run scenario without skill. Document what the agent does wrong. |
| GREEN | Write minimal skill addressing those specific failures. Verify agent now complies. |
| REFACTOR | Find new rationalizations the agent invents. Close loopholes. Re-test. |
When to Create a Skill
Create when: the technique wasn't obvious, you'd reference it across projects, others would benefit, or the agent repeatedly fails without it.
Don't create for: one-off solutions, standard practices documented elsewhere, project-specific conventions (use CLAUDE.md), or mechanical constraints (enforce with linters/hooks instead).
Core Principles
Instructions are a finite budget. Models reliably follow ~150 instructions. The system prompt burns ~50. Every instruction you add degrades compliance across ALL instructions. Be ruthless about what earns a slot.
Peripheral positions get more attention. First and last sections receive disproportionate attention. Bury a critical rule in the middle and it gets skipped.
Specificity drives compliance. "Consider running audits" gets ignored. "Call accessibility_audit before emitting RUN_COMPLETED" gets followed.
Unconditional beats conditional. "You MUST call X" works. "When appropriate, consider X" does not.
Description Optimization
The description field controls when the skill activates. The agent reads it to decide: "Should I load this right now?" Describe triggering conditions, not workflow.
Bad: description: A skill for TDD - write test first, watch it fail, write code
Good: description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently. Covers flaky tests, hanging tests, zombie processes.
Checklist for descriptions:
- Start with "Use when..." focusing on triggering conditions
- Include specific symptoms and error messages the user would mention
- Include synonyms (timeout/hang/freeze, cleanup/teardown/afterEach)
- Name tools, commands, file types that signal relevance
- Max 500 characters. Under 200 for frequently-loaded skills
- Never summarize the skill's process — this causes agents to follow the description instead of loading the full skill
Skill Structure
.agents/skills/<skill-name>/
SKILL.md # Required. Under 200 lines.
references/ # Optional. Heavy reference docs (100+ lines).
Frontmatter
---
name: kebab-case-name
description: Use when [triggering conditions]. Covers [symptoms, tools, keywords].
---
Use verb-first active naming: creating-skills not skill-creation, condition-based-waiting not async-helpers.
Writing Instructions That Get Followed
Lead with identity, then constraints
One-line role statement, then hard constraints immediately. Do not build up to them.
# Browser Test Executor
You are executing adversarial browser tests against code changes.
REQUIRED for every test run:
1. Call accessibility_audit before completing
2. Call performance_metrics before completing
3. Call close to flush the session
Use numbered checklists gated on completion
Numbered lists signal sequence and completeness. Tie them to whatever marks the task "done."
Before marking the task complete, you MUST:
1. Run the linter
2. Run the type checker
3. Run the test suite
Do not skip any step. A skipped step is a failed task.
The trailing "do not skip" is not redundant — it closes the escape hatch.
Name tools and commands exactly
Never "run the appropriate checks." Always call accessibility_audit or run pnpm typecheck.
Put mandatory behaviors at the end
The last section is the last thing read before acting. Put the non-negotiable checklist there.
One good example, one bad example
Examples are worth more than explanations.
**Bad:** `test the login page`
**Good:** `Submit login with empty fields, invalid email, wrong password, valid credentials. Verify errors and redirect.`
Bulletproofing Against Rationalization
Agents invent reasons to skip rules. Anticipate and close every loophole explicitly.
Close escape hatches: After every mandatory rule, add: "No exceptions for 'just this once', 'it's simple enough', or 'I'll do it next time'."
Build a rationalization table from your RED phase testing. Every excuse the agent made becomes an explicit counter in the skill:
| Agent Says | Skill Responds |
|---|---|
| "This change is too small to test" | Every change gets tested. Size is not an exemption. |
| "I already verified manually" | Manual verification is not a substitute. Run the tool. |
| "The user didn't ask for this" | The skill requires it. User silence is not opt-out. |
Create a red flags list: "If you catch yourself thinking any of these, stop and follow the checklist."
Anti-Patterns
| Pattern | Why It Fails |
|---|---|
| "Consider running X when appropriate" | Conditional + vague = ignored |
| Giant skill files (>200 lines) | Exceeds instruction budget, compliance drops |
| Duplicating CLAUDE.md rules | Burns budget twice for the same instruction |
| Listing every edge case | Show one example, not twenty — the agent infers the rest |
| "Be thorough" / "Be careful" | Vibes, not instructions. Say what "thorough" means concretely |
| Multi-paragraph justifications | The agent needs the rule, not the backstory. One sentence of "why" max |
| Critical rules in the middle | Middle-of-prompt blindspot. Move to top or bottom |
| Writing a skill without testing first | You don't know what failures to address. Delete it, start with RED. |
| Workflow-summarizing descriptions | Agent follows the summary instead of loading the full skill |
Checklist: Before Shipping
- Tested without the skill first (RED phase) and documented failures
- Skill addresses those specific failures, not hypothetical ones
- Description starts with "Use when..." and lists triggering conditions, not workflow
- Under 200 lines total
- Mandatory behaviors in a numbered checklist at top or bottom
- Every tool/command named exactly — no "appropriate" or "relevant"
- No hedging ("consider", "when appropriate", "if needed")
- At least one good/bad example pair
- No duplication with CLAUDE.md or other skills
- Escape hatches closed with explicit "no exceptions" language
- Re-tested with the skill (GREEN phase) and agent now complies
More from millionco/expect
expect
Use when editing .tsx/.jsx/.css/.html, React components, pages, routes, forms, styles, or layouts. Also when asked to test, verify, validate, QA, find bugs, check for issues, or fix expect-cli failures.
1.7Kdeslop
Simplify and refine recently modified code while preserving functionality. Use when asked to "deslop", "clean up code", "simplify code", or after making changes that could benefit from refinement.
61code-review
Perform code reviews. Use when reviewing pull requests, examining code changes, or providing feedback on code quality. Covers security, performance, testing, and design review.
35vercel-react-best-practices
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.
35effect-portable-patterns
Portable Effect patterns for robust promise execution. Use when wrapping async operations with timeouts, retries, tagged errors, caching, concurrency, pattern matching, or tracing - all designed to resolve to a plain Promise via Effect.runPromise.
34effect-best-practices
Enforces Effect-TS patterns for services, errors, layers, and atoms. Use when writing code with Effect.Service, Schema.TaggedError, Layer composition, or effect-atom React components.
31