Creating Claude Skills

Skills are reusable instruction sets that teach Claude how to perform specific tasks consistently and well. A skill is a folder with a SKILL.md file and optional supporting resources. When triggered, Claude reads the SKILL.md and follows its instructions.

Think of a skill as programming Claude's behavior using precise natural language — where the LLM is the interpreter and the skill is the source code.

When to Create a Skill

Create a skill when:

A task needs consistent, repeatable execution across many conversations
The task involves specific tools, sequences, or output formats that Claude wouldn't know by default
You've iterated on a workflow manually and want to capture what works
The instructions are too detailed to repeat every time but too important to leave to chance

Don't create a skill when:

A simple prompt gets the job done
The task is a one-off
The behavior is already built into Claude

Workflow

1. Capture Intent → 2. Research & Interview → 3. Draft SKILL.md →
4. Test with real prompts → 5. Refine → 6. Package & deliver

Step 1: Capture Intent

Start by understanding what the user wants. Answer these four questions:

What should the skill enable Claude to do? Get specific: "Create pixel-perfect Word documents with corporate letterhead" is better than "make docs."
When should it trigger? What phrases, file types, or contexts should activate this skill? Be specific — undertriggering is the most common failure mode.
What's the expected output? A file? A code snippet? A structured response? Define the deliverable clearly.
Are the outputs objectively verifiable? File transforms, data extraction, and code generation can be tested. Writing style and creative work usually can't. This determines whether to set up test cases.

If the current conversation already contains a workflow the user wants to capture (e.g., "turn this into a skill"), extract answers from the conversation history — the tools used, the sequence of steps, corrections made, input/output formats observed. Fill gaps with targeted questions.

Step 2: Skill Anatomy

Every skill follows this structure:

skill-name/
├── SKILL.md              ← Required. The instruction file.
│   ├── YAML frontmatter   (name + description — always in context)
│   └── Markdown body      (instructions — loaded when skill triggers)
│
└── Optional resources/
    ├── scripts/           Executable code for deterministic tasks
    ├── references/        Docs loaded into context as needed
    └── assets/            Templates, icons, fonts used in output

Do not include: README.md, INSTALLATION_GUIDE.md, CHANGELOG.md, or any documentation for humans. Skills are read by AI agents, not people.

Progressive Disclosure (Three-Level Loading)

Skills load information in layers to manage context efficiently:

Level	What	When loaded	Size guidance
1. Metadata	`name` + `description` in YAML frontmatter	Always in context	~100 words
2. SKILL.md body	The full markdown instructions	When skill triggers	<500 lines ideal
3. Bundled resources	Scripts, references, assets	On-demand via explicit read	Unlimited

The description is the most important part — it's the trigger mechanism. The body is the playbook. Resources are the deep reference material.

Step 3: Writing the YAML Frontmatter

The `name` Field

A short identifier. Lowercase, hyphenated. Examples: docx, pdf, frontend-design, deploy-aws.

The `description` Field (Critical)

The description is the primary trigger mechanism. Claude decides whether to use a skill almost entirely based on the description matching the user's request. A bad description means the skill never fires.

Common failure: undertriggering. Claude tends to be conservative about activating skills. Combat this by making descriptions slightly "pushy" — explicitly list scenarios where the skill applies, even ones that seem obvious.

Template:

description: >
  [What the skill does — 1 sentence].
  Use this skill when [primary triggers].
  Also use when [secondary triggers that might be missed].
  Covers [key capabilities].
  Do NOT use for [explicit exclusions to prevent false triggers].

Good example:

description: >
  Use this skill whenever the user wants to create, read, edit, or
  manipulate Word documents (.docx files). Triggers include: any mention
  of "Word doc", "word document", ".docx", or requests to produce
  professional documents with formatting like tables of contents,
  headings, page numbers, or letterheads. Also use when extracting or
  reorganizing content from .docx files, inserting or replacing images
  in documents, performing find-and-replace in Word files, working with
  tracked changes or comments, or converting content into a polished
  Word document. If the user asks for a "report", "memo", "letter",
  "template", or similar deliverable as a Word or .docx file, use this
  skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general
  coding tasks unrelated to document generation.

Why it works:

Lists the primary trigger phrases explicitly
Includes secondary triggers that could be missed
Has exclusions to prevent false positives
Mentions the user's likely vocabulary, not just technical terms

Bad example:

description: Creates Word documents.

Why it fails: Too vague, no trigger phrases, no exclusions. Claude won't know when to activate it.

Description writing checklist:

States what the skill does in the first sentence
Lists explicit trigger phrases the user might say
Includes secondary/non-obvious trigger scenarios
Mentions relevant file types or extensions
Has exclusions to prevent false triggers
Is slightly "pushy" to counteract undertriggering

Step 4: Writing the SKILL.md Body

Structure Patterns

Choose the structure that fits the skill's purpose. Most skills combine patterns:

Pattern	Best for	Example structure
Workflow-based	Sequential processes	Overview → Decision Tree → Step 1 → Step 2...
Task-based	Tool collections with distinct operations	Overview → Quick Start → Task A → Task B...
Reference/Guidelines	Standards, specs, brand guidelines	Overview → Guidelines → Specifications → Usage
Capabilities-based	Integrated systems with related features	Overview → Core Capabilities → Feature 1 → Feature 2...

Writing Principles

Use the imperative form. Write instructions as commands, not descriptions.

# Good
Read the input file. Extract the header row. Validate each column type.

# Bad
The skill should read the input file and then it would extract the header row...

Explain the WHY, not just the WHAT. Today's LLMs are smart. When they understand the reasoning behind an instruction, they generalize better and handle edge cases that the instruction didn't explicitly cover.

# Good
Use pandoc for text extraction because it preserves tracked changes
and handles nested formatting that raw XML parsing misses.

# Bad
ALWAYS use pandoc for text extraction. NEVER use raw XML parsing.

Avoid heavy-handed MUSTs and NEVERs. If you find yourself writing ALWAYS or NEVER in all caps, that's a yellow flag. Reframe by explaining the reasoning so the model understands why it matters. Heavy-handed constraints make skills brittle and narrow.

Include few-shot examples. Show the model what good input→output looks like. Examples are one of the most powerful tools for shaping behavior.

## Commit message format

**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication

**Example 2:**
Input: Fixed crash when uploading files over 10MB
Output: fix(upload): handle large file uploads without crash

Define output formats explicitly when the output structure matters:

## Report structure
Use this exact template:

# [Title]
## Executive summary
[2-3 sentence overview]
## Key findings
[Bulleted list of findings with evidence]
## Recommendations
[Numbered actionable recommendations]

Organizing Large Skills

If the SKILL.md is approaching 500 lines, split content into reference files:

my-skill/
├── SKILL.md                  (core workflow + pointers to references)
└── references/
    ├── aws-deployment.md      (read when deploying to AWS)
    ├── gcp-deployment.md      (read when deploying to GCP)
    └── troubleshooting.md     (read when errors occur)

In the SKILL.md, tell Claude when to read each reference:

## Deployment
Read the appropriate reference file based on the target platform:
- AWS: `references/aws-deployment.md`
- GCP: `references/gcp-deployment.md`

If you encounter errors during deployment, consult `references/troubleshooting.md`.

For reference files over 300 lines, include a table of contents at the top.

Scripts

Use scripts for deterministic, repetitive operations. Scripts execute without loading into context (saving tokens), but Claude can read them when it needs to understand or modify the logic.

Good candidates for scripts:

File format conversions
Validation and linting
Template scaffolding
Data extraction or transformation

#!/usr/bin/env python3
"""
Extract form fields from a PDF.
Usage: python scripts/extract_fields.py input.pdf
"""

Assets

Use assets for static files that appear in outputs: templates, fonts, icons, images, CSS files. Assets are copied or referenced, not interpreted.

Step 5: Testing

After drafting the skill, create 2-3 realistic test prompts — the kind of thing a real user would actually say, not carefully constructed test inputs.

Good test prompts:

"Can you make me a Word doc summarizing this data?" (casual, vague)
"Create a professional report with a table of contents and page numbers" (specific)
"Fix the formatting in this .docx file" (edit task)

What to verify:

Does the skill trigger correctly from the test prompt?
Does Claude follow the instructions in the SKILL.md?
Is the output quality acceptable?
Does it handle edge cases (missing inputs, ambiguous requests)?

If the user wants formal evals, create an evals/evals.json:

{
  "skill_name": "my-skill",
  "evals": [
    {
      "id": 1,
      "prompt": "The user's actual request",
      "expected_output": "What success looks like",
      "files": [],
      "assertions": [
        "Output includes a table of contents",
        "All headings use Heading 1 style",
        "Page numbers appear in the footer"
      ]
    }
  ]
}

The Feedback Loop

Every time the user provides an example or input, immediately run it rather than waiting for a full specification. Show the output and let the user react. Seeing what Claude actually does is the fastest way to refine requirements.

Step 6: Common Pitfalls

Pitfall	Fix
Skill never triggers	Make the description more explicit and "pushy" — list more trigger phrases
Skill triggers on wrong requests	Add exclusions to the description ("Do NOT use for...")
Claude ignores instructions	Add examples showing the desired behavior — few-shot > rules
Output format is inconsistent	Provide an explicit template with exact structure
SKILL.md is too long	Move detailed content to `references/` files with clear pointers
Skill is too rigid	Replace MUST/NEVER rules with explanations of WHY
Skill only works for test examples	Generalize from specific feedback — explain principles, not just fixes
Claude wastes time on unproductive steps	Read the execution transcript, remove instructions causing wasted effort

Step 7: Packaging & Delivery

Once the skill is finalized:

Verify the structure — SKILL.md exists with valid YAML frontmatter
Check all file references — scripts, references, and assets are present and paths are correct
Copy to the output directory so the user can access it
Present the files to the user with a brief summary of what the skill does and how to install it

If a package_skill.py script is available, use it to create a .skill archive:

scripts/package_skill.py <path/to/skill-folder>

Quick Reference: SKILL.md Template

For a ready-to-use template, read templates/skill-template.md in this skill's directory.

Meta-Advice for Skill Authors

These principles come from extensive iteration on production skills:

The description is everything. Spend more time on the description than you think you need. A great skill with a bad description is invisible.
Explain reasoning, not just rules. "Use pandoc because it handles tracked changes" beats "ALWAYS use pandoc." The model generalizes better when it understands why.
Write a draft, then look at it fresh. Your first draft will be either too detailed (overfitting to examples) or too vague (leaving room for interpretation where you don't want it). Revise with fresh eyes.
Generalize from specific feedback. When a user says "this output is wrong," don't just fix that specific case. Understand what principle was violated and encode the principle.
Keep it lean. Remove instructions that aren't pulling their weight. If something in the skill makes the model waste time on unproductive work, cut it.
Skills are for millions of uses. You iterate on a few examples because it's practical, but the skill must work across many different prompts. Avoid overfitting to your test cases.
Don't hold back. Claude is capable of extraordinary work. Skills that set high standards and explain why they matter produce dramatically better results than skills that play it safe.

create-skills