agent-model-selection
Agent Model Selection Skill
Purpose
Provides data-driven guidance for selecting the most appropriate language model when creating or modifying agent definitions.
When to Use
- When creating a new agent and need to assign a model
- When modifying an existing agent's model assignment
- When troubleshooting agent performance issues related to model capabilities
- When optimizing costs across the agent ecosystem
Reference Data
Always consult docs/ai-model-reference.md for:
- Current performance benchmarks by category (Coding, Reasoning, Language, Instruction Following, etc.)
- Model availability in GitHub Copilot Pro
- Premium request multipliers (cost)
- Recommended model assignments by agent type
- Task-based guidance and tutorials (via external links with descriptions)
This reference is updated periodically with latest benchmark data.
Critical Learnings
-
Use task-specific benchmarks, not overall scores
- Different models excel at different tasks
- Example: GPT-5.2-Codex excels in Coding while Claude Sonnet 4.5 is better for Language (76.00)
-
Claude Sonnet 4.5 has poor Instruction Following (score: 23.52)
- Unsuitable for agents that follow templates (Task Planner, Quality Engineer)
- Use Gemini models instead for structured output (scores: 65-75)
-
Gemini 3 Flash offers best value for many tasks
- 0.33x premium multiplier (cost-effective)
- Strong Instruction Following (74.86)
- Good Language performance (84.56)
- Ideal for: Task Planner, Release Manager, high-frequency agents
-
GPT-5.2-Codex is the latest coding model
- Latest generation Codex model (improved over 5.1 Codex Max)
- Specialized for agentic coding tasks
- Primary choice for Developer agent
- Also solid for Code Reviewer
-
Always verify model availability
- Check against official GitHub Copilot documentation
- Model names must match exactly (case-sensitive)
- Include "(Preview)" suffix for preview models (e.g., "Gemini 3 Pro (Preview)")
-
⚠️ Coding agents must NOT have
model:in frontmatter- The
model:property is only valid for VS Code agents (files without-coding-agentsuffix) - On GitHub.com, the
model:property in*-coding-agent.agent.mdfiles causes a hardCAPIError: 400 The requested model is not supportederror (confirmed by experiment) - The GitHub docs say this property is "ignored" but in practice it prevents the agent from running
- Apply model selection only to the corresponding VS Code agent file (e.g.,
developer.agent.md)
- The
Model Selection Process
When selecting or changing a model:
-
Identify the agent's primary task categories (from ai-model-reference.md)
- Coding, Reasoning, Language, Instruction Following, etc.
-
Check category-specific performance
- Look up relevant benchmarks in ai-model-reference.md
- Compare top 3-5 performers in that category
-
Consider cost vs frequency
- High-frequency agents → favor lower multipliers (0.33x, 0x)
- Critical accuracy agents → favor best performer regardless of cost
-
Verify availability
- Confirm model is listed in "Available Models" section
- Check it's available for VS Code (required)
-
Document your reasoning
- Include benchmark scores in proposal
- Explain trade-offs made
Example Model Selection
Scenario: Selecting model for Quality Engineer agent
- Primary tasks: Define test plans following specific template format
- Key categories: Instruction Following (critical), Reasoning (important)
- Benchmark lookup (from ai-model-reference.md):
- Gemini 3 Flash: Instruction Following 74.86, 0.33x cost ✅
- Gemini 3 Pro: Instruction Following 65.85, 1x cost ✅
- Claude Sonnet 4.5: Instruction Following 23.52 ❌ (disqualified)
- Decision: Gemini 3 Pro (balance of performance and cost)
- Rationale: Strong instruction following (65.85), reasonable cost (1x), good for template-based work
When to Update Model Assignments
Reassess models when:
- New benchmark data shows significant performance changes
- Agent is underperforming its tasks consistently
- New models are released with better performance
- Cost optimization is needed
- ai-model-reference.md is updated with new data
Key Principles
- Task-specific benchmarks matter more than overall scores
- Balance cost with performance based on agent frequency and criticality
- Always verify availability against official documentation
- Document your rationale for model selection decisions