agent-creator

SKILL.md

Agent Creator

Purpose: Teach the principles, patterns, and practices for creating high-quality specialized agents that follow v2 architecture standards.

Critical Use Case: This skill provides structured guidance for creating agents from requirements through deployment, preventing common mistakes and ensuring quality through automated validation.

Differentiation from agent-hr-manager:

  • agent-creator (this skill) = Teaching guide, knowledge resource, passive reference πŸ“–
  • agent-hr-manager (agent) = Autonomous executor, active creator, can use this skill πŸ‘¨β€πŸ«

Use agent-creator when learning how to create agents. Use agent-hr-manager when you want an agent automatically created.


When to Use This Skill

Use agent-creator when:

  • Creating a new specialized agent from scratch
  • Learning agent architecture and design patterns
  • Understanding quality validation (0-80 rubric)
  • Troubleshooting agent quality issues
  • Migrating agents to v2 architecture
  • Training others on agent creation

Do NOT use for:

  • Creating skills (use skill-creator skill instead)
  • Quick agent modifications (just edit directly)
  • General Claude usage questions

6-Step Agent Creation Workflow

Step 0: Research Existing Patterns (BEFORE DESIGN)

Objective: Understand what already exists before creating something new. This prevents duplicate agents and ensures you leverage proven patterns.

Why this matters: Creating an agent without research leads to:

  • Duplicating existing agent functionality
  • Missing reusable patterns from similar agents
  • Not discovering skills that solve part of the problem
  • Reinventing methodology that already exists

Actions:

  1. Search for Similar Agents:

    # List all available agents
    ls ~/.claude/agents/ | head -20
    
    # Search for agents in similar domain
    grep -l "[domain-keyword]" ~/.claude/agents/*.md 2>/dev/null
    
  2. Review Relevant Agent Examples:

    • Read references/agent-examples.md for quality patterns
    • Study agents with high quality scores (60+/80)
    • Note phase structures that work for similar domains
  3. Check Skill Inventory:

    # List available skills
    ls ~/.claude/skills/
    
    # Search for domain-relevant skills
    grep -r "[domain-keyword]" ~/.claude/skills/*/SKILL.md 2>/dev/null | head -10
    
  4. Decision Checkpoint (REQUIRED):

    | Question | Answer |
    |----------|--------|
    | Similar agent exists? | [yes/no - if yes, consider tuning instead] |
    | Relevant skills found? | [list skills to integrate] |
    | Reusable patterns identified? | [list patterns to follow] |
    | Proceed with new agent? | [yes with justification] |
    
  5. Research Novel Domains (if unfamiliar):

    • Use WebSearch for domain best practices
    • Find authoritative sources and frameworks
    • Document key methodologies the agent should follow

Deliverable: Research summary documenting similar agents, skills to integrate, and justification for new agent.


Step 1: Temporal Awareness & Requirements Gathering (CRITICAL)

Objective: Establish current date context and understand what the agent needs to do.

1.1 Establish Temporal Context (REQUIRED)

Why this matters: Legal documents, contracts, compliance reports, and project documentation with incorrect dates create serious risks. The pizza baker contract bug (January 2025 vs November 2025) demonstrated this - wrong dates in legal documents can affect validity and compliance.

Implementation:

## Phase 1: [Phase Name] & Temporal Awareness

**Objective**: [Phase goal]

**Actions**:
1. **Establish Temporal Context** (REQUIRED):
   ```bash
   CURRENT_DATE=$(date '+%Y-%m-%d')          # ISO 8601: 2025-11-06
   READABLE_DATE=$(date '+%B %d, %Y')        # Human: November 06, 2025
   TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S %Z') # Full: 2025-11-06 12:34:56 EET
  • Use CURRENT_DATE for document metadata, version numbers
  • Use READABLE_DATE for human-readable headers
  • Use TIMESTAMP for detailed audit trails
  1. [Other Phase 1 actions...]

Deliverable: [Concrete output]


**Validation**: The validate_agent.py script checks for temporal awareness pattern in Phase 1.

#### 1.2 Gather Requirements

**Key Questions**:
1. **Problem Definition**: What problem does this agent solve?
2. **Domain Expertise**: What specialized knowledge is needed?
3. **Tool Requirements**: Which tools will it need? (Read, Write, Edit, Bash, Grep, Glob, etc.)
4. **Typical Workflow**: What is the step-by-step process?
5. **Success Metrics**: How do we know it worked?
6. **Edge Cases**: What unusual situations must it handle?

**Techniques**:
- **Example-Based**: Ask for 2-3 concrete usage examples
- **Anti-Pattern Analysis**: What should it NOT do?
- **Boundary Testing**: What are the limits (file size, complexity, scope)?

**Output**: Requirements document or clear mental model before proceeding.

---

### Step 1.5: Skill Discovery & Integration Planning

**Objective**: Identify which existing skills to integrate into the agent and how.

**Why this matters**: This skill moves beyond "prompt engineering" into "cognitive architecture" β€” ensuring the agent doesn't use a hammer for a screw. Proper skill integration gives agents specialized capabilities without reinventing them.

**Actions**:

1. **Map Requirements to Skill Categories**:
   ```markdown
   | Agent Requirement | Skill Category | Candidate Skills |
   |-------------------|----------------|------------------|
   | Debugging logic | Reasoning | hypothesis-elimination, self-reflecting-chain |
   | Security review | Development | security-analysis-skills, adversarial-reasoning |
   | Documentation | Documentation | document-writing-skills |
   | Database ops | Integration | chromadb-integration-skills |
   | Testing | Development | testing-methodology-skills |
   | Error handling | Development | error-handling-skills |
  1. Evaluate Each Candidate Skill:

    | Skill | Size | Active? | Integrate or Inline? |
    |-------|------|---------|---------------------|
    | [skill-name] | [lines] | [yes/no] | [integrate/inline/skip] |
    

    Decision Criteria:

    • Integrate if: Skill >100 lines, actively maintained, reusable
    • Inline if: Simple pattern <20 lines, agent-specific variant needed
    • Skip if: Not relevant after review
  2. Document Skills Integration:

    **Skills Integration**: skill-1, skill-2, skill-3
    

    This goes in the agent's header metadata.

  3. Plan Skill Invocation Points:

    | Phase | When to Invoke | Skill |
    |-------|----------------|-------|
    | Phase 2 | Complex decision | integrated-reasoning-v2 |
    | Phase 3 | Design validation | adversarial-reasoning |
    | Phase 4 | Error recovery | hypothesis-elimination |
    
  4. Check for Handover/Parallelism Needs:

    • Will the agent need multi-pattern reasoning? β†’ Add reasoning-handover-protocol
    • Will tasks run in parallel? β†’ Add parallel-execution skill
    • See cognitive-skills/INTEGRATION_GUIDE.md for patterns

Deliverable: Skill integration plan with invocation points documented.


Step 2: Architecture Design

Objective: Design the agent's phase structure, tool selection, and quality criteria.

2.1 Determine Agent Complexity

Decision Tree: Simple vs Complex Agent

Simple Agent (3 phases, <200 lines):

  • Single domain focus (e.g., PDF manipulation, CSV parsing)
  • Linear workflow (no branching)
  • Minimal state management
  • Examples: pdf-creator-agent, code-formatter

Complex Agent (4-5 phases, 200-250 lines):

  • Multiple operation modes (e.g., create, read, update)
  • Conditional branching or decision trees
  • State tracking across phases
  • Examples: legal-agent, ceo-orchestrator, agent-hr-manager

When to use integrated-reasoning-v2: 8+ decision dimensions, strategic importance, >90% confidence required

  • 9 patterns available: ToT, BoT, SRC, HE, AR, DR, AT, RTR, NDF
  • 11 scoring dimensions for pattern selection
  • See cognitive-skills/INTEGRATION_GUIDE.md for full integration patterns

2.2 Design Phase Structure

Guidelines (from agent-design-patterns.md):

  • 3-5 phases optimal (2 too simple, 6+ too complex)
  • Each phase has ONE clear objective
  • Actions are SPECIFIC, not generic
  • Deliverables are CONCRETE artifacts

Phase Structure Template:

## Phase N: [Descriptive Name]

**Objective**: [One sentence describing the goal]

**Actions**:
1. [Specific action with tool: "Use Grep to search for X pattern in Y files"]
2. [Specific action with tool: "Use Edit to modify lines 45-52 in config.yml"]
3. [Specific action with condition: "If errors found, use TodoWrite to track fixes"]

**Deliverable**: [Concrete output: "List of 5 validated regex patterns with test cases"]

Example from kaggle-leak-auditor:

  • Phase 1: Static Code Analysis β†’ List of violations
  • Phase 2: Runtime Validation β†’ Validation results
  • Phase 3: Report Generation β†’ Audit report with recommendations

2.3 Select Tools

Common Tool Combinations:

  • File analysis: Read, Grep, Glob
  • Code modification: Read, Edit, Write
  • Research: WebSearch, WebFetch, Read
  • Execution: Bash, TodoWrite, Read
  • Complex tasks: Task (invoke other agents)

Tool Selection Criteria:

  1. Minimal set: Only include tools actually used in phases
  2. Specific over general: Edit > Write for modifications
  3. Composed workflows: Grep to find, Read to analyze, Edit to modify

2.4 Define Success Criteria (10-16 items)

Categories:

  1. Phase Deliverables (3-5 items): "βœ… Phase 1 violations list complete with severity scores"
  2. Quality Gates (2-3 items): "βœ… All findings validated with evidence"
  3. Confidence (1 item): "βœ… Confidence level >85% with clear reasoning"
  4. Documentation (2-3 items): "βœ… Report includes examples and references"
  5. Edge Cases (2-3 items): "βœ… Handled missing files gracefully"
  6. Temporal (1 item): "βœ… Document dated with current date"

Format:

## Success Criteria

- βœ… Temporal awareness established in Phase 1
- βœ… Phase 1 deliverable: [specific output]
- βœ… Phase 2 deliverable: [specific output]
- βœ… All files created/modified successfully
- βœ… Quality validation passed with score β‰₯70/80
- βœ… Confidence level >85% with supporting evidence
- βœ… Edge cases documented and handled
- βœ… Reference documentation created (if using progressive disclosure)
[10-16 total items]

2.5 Design Self-Critique (6-10 questions)

Question Categories:

  1. Completeness: "Did I check all [domain-specific items]?"
  2. Confidence: "What is my confidence level? Why?"
  3. Assumptions: "What assumptions did I make?"
  4. False Positives: "Could [finding X] be wrong? How?"
  5. False Negatives: "What might I have missed?"
  6. Verification: "How can user verify this?"
  7. Temporal: "Did I use current date correctly?"

Format:

## Self-Critique

1. **Domain Accuracy**: Did I correctly apply [domain] expertise?
2. **Tool Selection**: Did I use optimal tools for each task?
3. **Edge Cases**: Did I handle errors and failures gracefully?
4. **Temporal Accuracy**: Did I establish current date in Phase 1?
5. **Confidence Basis**: What evidence supports my confidence level?
6. **Assumptions**: What assumptions should the user validate?
[6-10 total questions]

2.6 Define Confidence Thresholds

Three-Tier System:

## Confidence Thresholds

- **High (85-95%)**: [Specific conditions: "All criteria met, deliverables complete, tests passed"]
- **Medium (70-84%)**: [Conditions: "Most criteria met, minor issues present, acceptable quality"]
- **Low (<70%)**: [Conditions: "Significant issues, incomplete work - continue working"]

Domain-Specific Examples:

  • Code analysis: Based on test coverage, execution traces
  • Legal: Based on citation verification, precedent alignment
  • Research: Based on source quality, corroboration
  • Debugging: Based on reproduction success, log evidence

Step 3: Implementation

Objective: Write the agent definition file following v2 architecture.

3.1 Create Agent Frontmatter

Template:

---
name: agent-name
description: Clear one-sentence description. Use when [specific trigger conditions]. Examples: [concrete user questions].
tools: Read, Write, Edit, Bash, Grep, Glob, TodoWrite
model: claude-sonnet-4-5
color: blue
---

Guidelines:

  • name: Hyphen-case (my-agent-name), <40 chars
  • description: Include WHEN to use + example questions
  • tools: Only list tools actually used in phases
  • model: Usually claude-sonnet-4-5 (use opus for complex reasoning)
  • color: blue/green/purple/gold/red for visual grouping

3.2 Write Agent Opening

Structure:

# Agent Name

**Purpose**: [1-2 sentences on what this agent does]

**Core Responsibilities**:
1. [Responsibility 1 with domain context]
2. [Responsibility 2 with domain context]
3. [Responsibility 3 with domain context]
[3-7 items total]

**Specialized Knowledge** (if applicable):
- Domain-specific terminology
- Technical constraints
- Industry standards

3.3 Add Decision Tree (if multi-mode)

When to include: Agent operates in different modes or scenarios

Template:

## Decision Tree: [What to Decide]

When tasked with [type of request], first determine the appropriate [mode/type]:

**Mode A** - Use when:
- [Condition 1]
- [Condition 2]
- User asks "[example question]"
β†’ Follow Phase 1A-2A workflow

**Mode B** - Use when:
- [Condition 1]
- [Condition 2]
- User asks "[example question]"
β†’ Follow Phase 1B-2B workflow

3.4 Implement Phases (from Step 2.2)

Critical: First phase MUST include temporal awareness pattern.

3.5 Add Success Criteria, Self-Critique, Confidence (from Step 2.4-2.6)

3.6 Consider Progressive Disclosure

When to extract to references:

  • Agent would exceed 250 lines with inline details
  • Has extensive pattern catalogs (3+ detailed patterns)
  • Includes large lookup tables or reference data
  • Contains detailed code examples (>30 lines)

What to extract:

  • Detailed code examples
  • Technical deep-dives
  • Edge case handling details
  • Reference lookup tables

Reference in main agent:

## Pattern Detection

**Reference Documentation**: `~/.claude/agents-library/refs/[agent]-patterns.md`

**Key patterns** (see reference for details):
1. Pattern A (CRITICAL)
2. Pattern B (WARNING)
3. Pattern C (INFO)

Line Count Targets:

  • Main agent: 150-250 lines (ideal: 200)
  • Reference docs: 200+ lines (no limit)

Step 4: Quality Validation

Objective: Score agent quality using 0-80 rubric and iterate if needed.

4.1 Use Automated Validation

Run validate_agent.py:

~/.claude/skills/agent-creator/scripts/validate_agent.py /path/to/agent.md

Output:

Quality Score: 72/80 (Excellent)

Phase Structure: 15/15 βœ…
Success Criteria: 14/15 ⚠️  (Missing 1 criterion)
Self-Critique: 10/10 βœ…
Progressive Disclosure: 8/10 ⚠️  (232 lines, close to limit)
Tool Usage: 10/10 βœ…
Documentation: 5/10 ❌ (Missing examples)
Edge Case Handling: 10/10 βœ…

Recommendations:
- Add 1 more success criterion (target: 10-16)
- Add usage examples for better documentation

Scoring Rubric:

  • 70-80: Excellent - production ready
  • 60-69: Good - minor improvements needed
  • 50-59: Fair - significant improvements needed
  • <50: Poor - major refactoring required

See references/quality-rubric-explained.md for detailed breakdown.

4.2 Manual Review Checklist

Even with automated scoring, manually verify:

  • Temporal awareness in Phase 1 with REQUIRED label
  • All tools in frontmatter are actually used in phases
  • Success criteria are specific and measurable (not vague)
  • Self-critique questions are domain-specific (not generic)
  • Confidence thresholds have concrete conditions
  • Examples demonstrate real usage (if included)
  • No spelling errors in critical sections
  • Markdown formatting is valid

4.3 Iterate if Score <70

Common improvements:

  • Add edge case handling (+10 pts): Document error conditions
  • Improve documentation (+5-10 pts): Add examples, clarify instructions
  • Refine success criteria (+3-5 pts): Make more specific and measurable
  • Progressive disclosure (+5-10 pts): Extract details to references if >250 lines

Iterate until score β‰₯70 or diminishing returns.


Step 5: Deployment

Objective: Deploy agent to appropriate location(s) and verify availability.

5.1 Determine Deployment Target(s)

Global Library (~/.claude/agents-library/):

  • Persistent across all projects
  • Available to all Claude Code instances
  • Use for: Reusable agents (research, code formatting, validation)

Local Project (.claude/agents/):

  • Project-specific
  • Version controlled with project
  • Use for: Domain-specific agents (this project's business logic)

Both: Deploy to global first, copy to local if project needs it

5.2 Deploy Agent

To Global Library:

cp /path/to/my-agent.md ~/.claude/agents-library/my-agent.md

To Local Project:

cp /path/to/my-agent.md ./.claude/agents/my-agent.md

With References:

# Deploy agent
cp my-agent.md ~/.claude/agents-library/

# Deploy reference doc
cp my-agent-patterns.md ~/.claude/agents-library/refs/

5.3 Verify Availability

Restart Claude Code to load new agent.

Test invocation:

"[Agent Name], help me with [typical task]"

Check agent registry (if using CEO orchestrator):

  • Update CEO's worker agent registry if this is a new operational agent
  • Add estimated duration based on similar agents

Decision Trees

Decision Tree 1: Create New Agent vs Extend Existing

Create New Agent when:

  • New domain/expertise area (e.g., adding legal agent when only have code agents)
  • Different tool requirements (e.g., new agent needs Bash, existing only uses Read/Write)
  • Different phase structure (e.g., new agent has 5 phases, existing has 3)
  • User explicitly requests new agent

Extend Existing Agent when:

  • Same domain, just adding capabilities (e.g., PDF agent adding form-filling)
  • Same tool set, similar workflow
  • Agent currently <200 lines (room to grow)
  • Change is backward compatible

Create New + Deprecate Old when:

  • Fundamental architecture change (v1 β†’ v2)
  • Existing agent has quality score <40
  • Existing agent >300 lines and unmaintainable

Decision Tree 2: When to Use Cognitive Reasoning Patterns

Use integrated-reasoning-v2 (meta-orchestrator) when:

  • 8+ decision dimensions (architecture, tools, phases, quality, deployment, etc.)
  • Strategic importance (affects multiple projects, long-term impact)
  • Uncertain which reasoning pattern is best for the problem

Direct pattern selection (skip meta-orchestrator):

  • Diagnosis/debugging β†’ Use hypothesis-elimination (HE)
  • Security review β†’ Use adversarial-reasoning (AR)
  • Trade-off resolution β†’ Use dialectical-reasoning (DR)
  • Novel problem β†’ Use analogical-transfer (AT)
  • Time pressure β†’ Use rapid-triage-reasoning (RTR)
  • Stakeholder coordination β†’ Use negotiated-decision-framework (NDF)
  • High confidence required (>90%, mission-critical)
  • Complex trade-offs (performance vs accuracy, simplicity vs power)

Use tree-of-thoughts when:

  • Clear evaluation criteria exist
  • Need single best solution
  • Medium complexity (4-7 dimensions)

Use breadth-of-thought when:

  • Solution space unknown
  • Need to explore all options
  • Multiple valid approaches

Use self-reflecting-chain when:

  • Sequential dependencies
  • Need step-by-step validation
  • Logical reasoning with backtracking

Use direct implementation when:

  • Simple agent (<3 phases)
  • Well-understood domain
  • Similar agents exist as templates

Common Mistakes to Avoid

See references/common-mistakes.md for detailed analysis. Top 5 pitfalls:

1. Missing Temporal Awareness ❌

Mistake: Forgetting to check current date in Phase 1 Impact: Documents with wrong dates (legal/compliance risk) Fix: Always include temporal awareness with REQUIRED label in Phase 1

2. Vague Success Criteria ❌

Mistake: "βœ… Agent works correctly" (not measurable) Impact: Can't validate agent actually succeeded Fix: "βœ… Generated report includes 5 sections: summary, findings, evidence, recommendations, confidence score"

3. Generic Self-Critique ❌

Mistake: "Did I do a good job?" (applies to everything) Impact: Doesn't catch domain-specific errors Fix: "Did I validate all legal citations against Finlex API?" (domain-specific)

4. Tool Overload ❌

Mistake: Listing 10+ tools in frontmatter when only 3 are used Impact: Confusing, suggests agent does more than it does Fix: Only list tools actually referenced in phase actions

5. No Edge Case Handling ❌

Mistake: Only implementing "happy path" Impact: Agent fails on unexpected inputs, errors not handled gracefully Fix: Add "Edge Cases" section, document what to do when things go wrong


Using validate_agent.py

The validation script provides automated quality scoring:

Basic Usage:

~/.claude/skills/agent-creator/scripts/validate_agent.py ~/.claude/agents-library/my-agent.md

Output Interpretation:

  • 70-80: Ship it! Excellent quality
  • 60-69: Almost there, minor fixes
  • 50-59: Needs work, iterate
  • <50: Major refactoring required

What it checks:

  • Phase structure (3-5 phases, clear objectives, deliverables)
  • Success criteria (10-16 items, specific)
  • Self-critique (6-10 questions, domain-specific)
  • Progressive disclosure (150-250 line target)
  • Tool usage (tools in frontmatter match phase usage)
  • Documentation (examples, references)
  • Edge case handling (documented error scenarios)
  • Temporal awareness (REQUIRED in Phase 1)

See references/quality-rubric-explained.md for scoring details.


Reference Documentation

This skill includes detailed reference documentation:

references/agent-examples.md: Annotated examples of high-quality agents

  • legal-agent (264 lines, progressive disclosure, 68/80 quality)
  • ceo-orchestrator (244 lines, integrated-reasoning integration)
  • agent-hr-manager (748 lines, meta-agent patterns)

references/quality-rubric-explained.md: Deep-dive on 0-80 scoring system

  • Detailed breakdown of each category
  • Examples of excellent vs poor implementations
  • How to improve scores in each area

references/common-mistakes.md: Anti-pattern catalog

  • 10 most common agent creation mistakes
  • Real examples from production agents
  • How to detect and fix each mistake

references/temporal-awareness-deep.md: Why temporal awareness matters

  • Legal/compliance risks of wrong dates
  • The pizza baker contract bug case study
  • Implementation patterns and validation

Quick Start Examples

Example 1: Simple Agent (CSV to Markdown Converter)

Requirements: Convert CSV files to markdown tables

Architecture:

  • 3 phases (Parse CSV β†’ Format Table β†’ Output Markdown)
  • Tools: Read, Write, Bash
  • <200 lines, no progressive disclosure needed

Key Decisions:

  • Simple agent (linear workflow)
  • No decision tree (single mode)
  • Success criteria: 10 items
  • Self-critique: 6 questions

Implementation time: ~20 minutes Expected quality score: 63-70/80

Example 2: Complex Agent (Multi-Language Legal Compliance Checker)

Requirements: Check code/documents for GDPR, Finnish, and EU law compliance

Architecture:

  • 5 phases (Temporal + Scan β†’ Finnish Law β†’ EU Law β†’ Cross-Reference β†’ Report)
  • Tools: Read, Bash, Grep, WebFetch, Task (for legal-agent)
  • 220 lines with references/legal-patterns.md (150 lines)

Key Decisions:

  • Complex agent (multi-jurisdiction)
  • Decision tree (document type: code vs contracts vs policies)
  • Success criteria: 14 items
  • Self-critique: 8 questions
  • Uses integrated-reasoning for cross-jurisdiction conflicts

Implementation time: ~2 hours Expected quality score: 72-80/80


Summary: 5-Step Workflow

  1. Temporal Awareness & Requirements β†’ Current date + clear problem definition
  2. Architecture Design β†’ Phases, tools, success criteria, self-critique, confidence
  3. Implementation β†’ Write agent following v2 patterns (150-250 lines)
  4. Quality Validation β†’ Score with validate_agent.py (target: β‰₯70/80)
  5. Deployment β†’ Copy to global library and/or local project

Validation checkpoint: Run validate_agent.py before deploying!


Meta: This skill was designed using integrated-reasoning (94% confidence) to synthesize patterns from agent-design-patterns.md and 17 production v2 agents.

Weekly Installs
6
GitHub Stars
2
First Seen
Jan 25, 2026
Installed on
claude-code6
gemini-cli5
opencode4
antigravity4
github-copilot4
codex4