surface
Surface
One skill for making software legible to agents. Routes to the right workflow based on what you need: audit and score a codebase, scaffold agent infrastructure, or transform specific surfaces.
Summary
Surface combines two workflows under one entry point:
- Audit workflow — Scores a codebase across 11 dimensions (0-3 each, max 33 points), produces clustered findings, generates transformation plans, and dispatches specialist sub-agents to fix gaps.
- Scaffold workflow — Creates and updates agent infrastructure (agents, tools, workflows, memory, model routing, browser, sandbox) shaped to the existing project. Recommends Mastra for TypeScript, adapts when a repo uses AI SDK, MCP, LangGraph, or Cloudflare Workers, and defaults to bounded agents with explicit tool scope, fallback behavior, and evaluation hooks.
Cross-runtime compatible: works with Claude Code, Claude Agent SDK, OpenAI Codex CLI, and generic agent runtimes.
<tool_restrictions>
REQUIRED TOOLS
- Read, Glob, Grep — for codebase analysis and project detection
- Write, Edit — for generating reports, plans, and scaffolding
- Agent — for dispatching specialist sub-agents in audit execution
- Bash — for checking installed packages
BANNED TOOLS
- EnterPlanMode — this skill manages its own process
- ExitPlanMode — you are never in plan mode </tool_restrictions>
Routing
Determine the workflow from the user's request:
| Signal | Route |
|---|---|
| "audit", "score", "assess", "how agent-ready" | Audit workflow |
| "plan", "transform", "fix", "improve" | Audit workflow (plan or transform mode) |
| "create agent", "add tool", "build workflow", "scaffold", "init" | Scaffold workflow |
| "add memory", "model routing", "browser", "sandbox" | Scaffold workflow |
| "add MCP", "create llms.txt", "write AGENTS.md" | Audit workflow (single-dimension transform) |
| Ambiguous | Ask: "Would you like me to audit this codebase for agent readiness, or scaffold new agent infrastructure?" |
Invocation Modes
Audit modes
/surface— Full audit with scorecard and findings/surface score— Quick scorecard only (no detailed findings)/surface plan— Full audit + transformation plan/surface transform— Full audit + plan + execute transformations/surface --dimension=mcp— Audit a single dimension/surface --format=json— Output as structured JSON
Scaffold modes
/surface init— Initialize agent infrastructure in a project/surface agent <name>— Scaffold a new agent with tools and instructions/surface tool <name>— Scaffold a new tool with Zod schemas/surface workflow <name>— Scaffold a workflow with steps and state/surface memory— Add memory (Mastra Memory + PgVector) to the project/surface model— Set up multi-provider model routing/surface browser— Add browser/web access tooling for agents/surface sandbox— Add isolated code execution tooling for agents
Audit Workflow
Phase 0: Project Detection
<hard_gate> Phase 0 MUST complete before ANY scoring begins. Do not guess stack or surfaces. Read the files. </hard_gate>
Gather context in parallel using Glob and Grep:
-
Stack detection — Read package.json, Cargo.toml, pyproject.toml, go.mod, etc. Identify framework (Next.js, Express, FastAPI, etc.), language, package manager.
-
Surface detection — Search for existing agent surfaces:
Surface Glob Patterns OpenAPI specs **/openapi.{json,yaml,yml},**/swagger.{json,yaml}MCP servers **/.mcp.json, grep for@modelcontextprotocol/sdk,mcp-handler,@mastra/mcpContext files **/AGENTS.md,**/CLAUDE.md,**/.cursor/rules/*.mdc,**/.github/copilot-instructions.mdDiscovery **/llms.txt,**/llms-full.txt,**/robots.txt,**/sitemap.xml,**/.well-known/**,**/api-catalog*Agent web readiness grep for Accept: text/markdown,Content-Signal,http-message-signatures-directory,agent-skills,mcp/server-card,oauth-protected-resource,api-catalog,x402,ucp,acpCLI tools **/bin/*, CLI entry points in package.jsonbinfieldAPI routes **/app/api/**,**/pages/api/**,**/routes/**Structured data grep for schema.org,application/ld+jsonin layout/template filesAuth grep for client_credentials,Bearer,JWT, OAuth config filesTests **/*.test.*,**/*.spec.*,**/tests/,**/evals/ -
Dimension applicability — Not all dimensions apply:
Dimension Applicable When API Surface Project exposes HTTP endpoints CLI Design Project has or is a CLI tool MCP Server Always (any project can expose one) Discovery & AEO Project has a web presence Authentication Project has auth or API access Error Handling Always Tool Design Project defines agent tools Context Files Always Multi-Agent Project involves agent orchestration Testing Always Data Retrievability Project exposes knowledge, documents, or searchable data
Present detected surfaces and applicable dimensions. Then proceed to Phase 1.
Phase 1: Dimension Scoring
<hard_gate> Phase 0 MUST have completed. Applicable dimensions must be determined. Do NOT read scoring-rubric.md or any dimension reference file in this context. Scoring is performed by specialist scoring agents, each with its own context window. </hard_gate>
Dispatch Scoring Agents
For each applicable dimension from Phase 0, dispatch the corresponding scoring agent using the Agent tool. Pass a prompt containing:
- Project summary — name, root path, language, framework, package manager
- Detected surfaces — file paths and patterns found in Phase 0 that are relevant to this specific dimension
- Applicability note — why this dimension applies
Dispatch ALL applicable dimensions in parallel. Use a single message with multiple Agent tool calls. Do not wait for one to complete before dispatching the next.
For N/A dimensions (determined by Phase 0), do not dispatch an agent.
Agent dispatch template:
Agent({
description: "Score {Dimension Name}",
prompt: "Score the {Dimension Name} dimension for this project.\n\nProject: {name}\nRoot: {absolute path}\nStack: {language, framework, package manager}\n\nDetected surfaces relevant to this dimension:\n{list of file paths and patterns from Phase 0}\n\nRead your reference file, examine the project, and return your structured score."
})
The scoring agent mapping:
| Dimension | Agent |
|---|---|
| 1. API Surface | score-api-surface |
| 2. CLI Design | score-cli-design |
| 3. MCP Server | score-mcp-server |
| 4. Discovery & AEO | score-discovery-aeo |
| 5. Authentication | score-authentication |
| 6. Error Handling | score-error-handling |
| 7. Tool Design | score-tool-design |
| 8. Context Files | score-context-files |
| 9. Multi-Agent | score-multi-agent |
| 10. Testing | score-testing |
| 11. Data Retrievability | score-data-retrievability |
Collect and Assemble
When all scoring agents return, extract the <score_result> block from each response. Parse:
DIMENSION_NUMBER,SCORE,MAX,CONFIDENCE,BAR— for the scorecardSUMMARY— for the one-liner in the scorecard rowFINDINGS— for Phase 2 findings clustering
Failure handling: If a scoring agent fails, returns an error, or produces an unparseable response:
- Mark that dimension as
[???] ?/3 Scoring failedin the scorecard - Exclude it from the raw total (treat like N/A for math)
- Add a footer after the scorecard:
Warning: {N} dimension(s) failed to score — results may be incomplete
Scorecard Output
<hard_gate> The scorecard MUST be presented in EXACTLY this format. No variations. No prose summaries substituting for the table.
Always show both:
- a raw total:
score / max_applicable - a scaled total out of
30, used for the rating band
The scaled total is:
round((raw_score / max_applicable) * 30)
╔══════════════════════════════════════════════════════════════════╗
║ SURFACE SCORECARD ║
║ [Project Name] ║
║ [YYYY-MM-DD] ║
╠══════════════════════════════════════════════════════════════════╣
1. API Surface [█░░] 1/3 Good OpenAPI but human-oriented descriptions
2. CLI Design [███] 3/3 Full JSON output, schema introspection, hardened
3. MCP Server [░░░] 0/3 No MCP server
4. Discovery & AEO [█░░] 1/3 Basic AGENTS.md, no llms.txt
5. Authentication [██░] 2/3 OAuth 2.1 M2M, scoped tokens
6. Error Handling [░░░] 0/3 Generic status codes only
7. Tool Design [██░] 2/3 Good schemas, weak descriptions
8. Context Files [█░░] 1/3 Auto-generated CLAUDE.md
9. Multi-Agent [───] N/A Not an agent system
10. Testing [█░░] 1/3 Basic tool tests, no evals
11. Data Retrievability [░░░] 0/3 No semantic retrieval surface
╠══════════════════════════════════════════════════════════════════╣
║ TOTAL: 11/30 (scaled: 11/30) ║
║ RATING: Agent-tolerant ║
║ ║
║ ░░░░░░░░░░░░██████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ║
║ Human-only ▲Agent-tolerant Agent-ready Agent-first ║
╚══════════════════════════════════════════════════════════════════╝
Score visualization per dimension:
- 0/3: [░░░]
- 1/3: [█░░]
- 2/3: [██░]
- 3/3: [███]
- N/A: [───]
Rating bands:
| Range | Rating | Meaning |
|---|---|---|
| 0-7 | Human-only | Built for humans. Agents will struggle. |
| 8-14 | Agent-tolerant | Usable with heavy prompt engineering. |
| 15-22 | Agent-ready | Solid agent support. Few gaps remain. |
| 23-30 | Agent-first | Purpose-built for agents. Best in class. |
Always derive the rating from the scaled score, not the raw score. </hard_gate>
If mode is score, STOP HERE. Present scorecard and exit.
Single-Dimension Output (--dimension=X)
When scoring a single dimension, dispatch only the one corresponding scoring agent. Render a mini-scorecard instead of the full 11-row format:
╔══════════════════════════════════════════════════════════════════╗
║ SURFACE DIMENSION SCORE ║
║ [Project Name] — [Dimension Name] ║
║ [YYYY-MM-DD] ║
╠══════════════════════════════════════════════════════════════════╣
6. Error Handling [██░] 2/3 RFC 9457 with is_retriable, missing doc_uri
╠══════════════════════════════════════════════════════════════════╣
║ EVIDENCE: ║
║ - RFC 9457 Problem Details at src/lib/errors.ts:12 ║
║ - is_retriable field present on all error responses ║
║ - No doc_uri field found ║
╚══════════════════════════════════════════════════════════════════╝
Phase 2: Findings Report
Collect the FINDINGS sections from all scoring agent results. For dimensions scoring below 3 that returned findings, use those directly. If a scoring agent returned insufficient findings for its score, you may supplement with observations from Phase 0.
Finding Structure
<hard_gate> Every finding MUST have all five fields. No exceptions. </hard_gate>
| Field | Content |
|---|---|
| What | Specific issue with file:line reference |
| Why | Why this matters for agent consumption |
| Fix | Concrete steps to resolve (not vague advice) |
| Impact | Which dimension, current → target score |
| Severity | Critical / High / Medium / Low |
Severity Definitions
| Severity | Meaning |
|---|---|
| Critical | Blocks agent use entirely (e.g., browser-only auth, no structured output) |
| High | Significant friction (e.g., terse tool descriptions, no error recovery info) |
| Medium | Suboptimal but functional (e.g., missing AGENTS.md, no llms.txt) |
| Low | Polish item (e.g., missing field descriptions on schema, no token counts) |
Clustering
Cluster findings by "what you'd fix together", NOT by dimension.
Instead of: "API findings | CLI findings | MCP findings" Use: "API discoverability (3 findings across API Surface + Discovery + Tool Design)"
Each cluster:
- Descriptive name
- 1-sentence rationale
- Table of findings (severity, file, issue, dimension)
- Suggested approach (1-2 sentences)
- Dependencies on other clusters
Report Output
Write to docs/surface/audit-[YYYY-MM-DD].md in the project being audited.
If mode is not plan or transform, present findings and offer:
- Generate transformation plan
- Tackle the highest-impact cluster now
- Deep dive on a specific dimension
Phase 3: Transformation Plan
Generate an ordered plan prioritized by impact-to-effort ratio.
Default Priority Order
| Priority | What | Effort | Impact |
|---|---|---|---|
| 1 | Context files (AGENTS.md, CLAUDE.md) | Low | Immediate |
| 2 | Discovery (llms.txt, robots.txt) | Low | High |
| 3 | Error handling (RFC 7807) | Medium | Critical |
| 4 | API descriptions (rewrite for agents) | Medium | High |
| 5 | CLI enhancements (--json, exit codes) | Medium | Broad |
| 6 | Authentication (OAuth 2.1 M2M) | Medium | Removes blockers |
| 7 | MCP server | Higher | Transformative |
| 8 | Structured data (JSON-LD) | Medium | AEO |
| 9 | Data retrievability (indexing, hybrid search) | Higher | High leverage |
| 10 | Testing (eval suite) | Higher | Long-term |
| 11 | Multi-agent patterns | Highest | Advanced |
Task Format
Each task in the plan:
- Task ID and description
- Files to create or modify
- Complexity: S (< 1 hour) / M (1-4 hours) / L (4+ hours)
- Score impact: which dimensions improve, by how much
- Dependencies: which tasks must complete first
- Agent: which specialist to dispatch
Write plan to docs/surface/plan.md.
If mode is not transform, present plan and offer to execute.
Phase 4: Execution
<hard_gate> Phase 4 requires explicit user confirmation before dispatching any agent. Present the plan summary and ask: "Ready to execute? I'll start with [cluster name]." </hard_gate>
Specialist Agents
| Agent | Purpose | Model |
|---|---|---|
context-writer |
AGENTS.md, CLAUDE.md, .cursor/rules | Sonnet |
discovery-writer |
llms.txt, llms-full.txt, robots.txt, Content Signals, Markdown content negotiation, JSON-LD, .well-known discovery |
Sonnet |
error-designer |
RFC 7807 structured errors | Sonnet |
api-optimizer |
OpenAPI descriptions, extensions | Sonnet |
cli-enhancer |
--json output, exit codes, introspection | Sonnet |
auth-upgrader |
OAuth 2.1 Client Credentials | Sonnet |
mcp-builder |
MCP server creation/enhancement | Sonnet |
test-writer |
Agent evaluation suite | Sonnet |
retrievability-engineer |
Retrieval, indexing, and search surfaces | Sonnet |
agentic-patterns-writer |
Multi-agent and orchestration patterns | Sonnet |
Agent Dispatch
Each agent receives:
- Specific task(s) from the plan
- Relevant codebase files
- Scoring rubric for their dimension
- Reference doc:
${CLAUDE_SKILL_DIR}/references/[dimension].md
Agent Status Codes
Agents report one of:
- DONE — Task complete, all checks pass
- DONE_WITH_CONCERNS — Complete but flagged issues for review
- NEEDS_CONTEXT — Missing information, needs human input
- BLOCKED — Cannot proceed (dependency, permission, or technical blocker)
Execution Flow
- Group tasks by agent type
- Identify parallel-safe groups (no file conflicts)
- Dispatch parallel groups simultaneously
- Collect results, handle NEEDS_CONTEXT and BLOCKED
- Re-score affected dimensions
- Present delta scorecard:
╔════════════════════════════════════════════════════════╗
║ SURFACE DELTA SCORECARD ║
╠════════════════════════════════════════════════════════╣
Dimension Before After Delta
───────────────── ────── ───── ─────
Error Handling 0/3 2/3 +2 ✦
Context Files 1/3 3/3 +2 ✦
Discovery & AEO 1/3 2/3 +1 ↑
TOTAL: 11/27 → 16/27 (scaled: 12/30 → 18/30)
RATING: Agent-tolerant → Agent-ready ✦
╚════════════════════════════════════════════════════════╝
Post-Execution
- Update
docs/surface/scorecard.mdwith history entry - Offer to commit changes
- Suggest next improvement cluster
Scaffold Workflow
Phase 0: Project Detection
<hard_gate> Phase 0 MUST complete before ANY scaffolding. Read the project. Do not guess. </hard_gate>
Gather context in parallel:
-
Framework detection
Glob: **/package.json (root + packages/*) Grep: @mastra/core, @mastra/memory, @ai-sdk/*, @modelcontextprotocol/sdk, agents, @cloudflare/*, workers-ai-provider, wrangler, durable_objectsClassify:
- Mastra project —
@mastra/corein dependencies - AI SDK project —
aipackage without Mastra - MCP project —
@modelcontextprotocol/sdkwithout Mastra - Cloudflare Agents project —
agentspackage plus Workers, Wrangler, or Durable Objects config - Workers AI project — Cloudflare Workers with AI bindings but no agent runtime
- Greenfield — none of the above
- Mastra project —
-
Structure detection
Glob: **/mastra.ts, **/mastra/index.ts Glob: **/agents/**/*.ts, **/tools/**/*.ts, **/workflows/**/*.ts Glob: **/triggers.ts Glob: **/wrangler.toml, **/wrangler.json, **/worker-configuration.d.ts Grep: AI Gateway, env.AI, Vectorize, DurableObject, Browser, SandboxIdentify:
- Monorepo with
packages/agents/— turborepo/workspace pattern - App-local
src/mastra/— single-app pattern - App-local
lib/— Next.js convention - No agent code yet — greenfield
- Monorepo with
-
Existing inventory
- Count existing agents, tools, workflows
- Check for model routing (
model.ts, provider config, AI Gateway, OpenRouter, LiteLLM, Vercel AI Gateway) - Check for memory setup (
@mastra/memory,@mastra/pg, Agent Memory, Vectorize, AutoRAG, AI Search) - Check for browser access (Browser Run, Browserbase, Stagehand, Playwright)
- Check for sandbox/code execution (Cloudflare Sandbox SDK, Vercel Sandbox, Daytona, Modal)
- Check for instructions loading (markdown-based prompts)
Present findings and proceed to Phase 1.
Phase 1: Framework Recommendation
If the project is greenfield or doesn't have an agent framework:
Recommend Mastra. Explain why:
- TypeScript-native agent framework with first-class Zod support
- Built-in tool system, workflows, memory, and RAG
- Model router works with any AI SDK provider (Anthropic, OpenAI, Google, OpenRouter)
- MCP server exposure built in
- Active development, growing ecosystem
Recommend Cloudflare Agents instead of generic Mastra scaffolding when the project is already Workers-native or the target product explicitly needs edge-hosted agents, Durable Object identity, WebSockets, scheduled/background work, Browser Run, AI Gateway, Workers AI, Agent Memory, AI Search, Vectorize, or Sandbox close to the Worker runtime.
If the user prefers something else, respect that. Adapt patterns to their framework. If Mastra is already installed, skip this phase.
Phase 2: Intent & Scaffolding
Based on the invocation mode, generate the appropriate scaffolding.
<required_reading>
Before generating ANY code, read the conventions reference:
${CLAUDE_SKILL_DIR}/references/conventions.md
Before generating agents or tools, read the patterns reference:
${CLAUDE_SKILL_DIR}/references/patterns.md
Before generating model routing, read:
${CLAUDE_SKILL_DIR}/references/model-routing.md
Before generating workflows with non-trivial control flow (branches, loops,
parallel steps, suspend/resume, human-in-the-loop), read:
${CLAUDE_SKILL_DIR}/references/workflow-composition.md
Before wiring up agents, workflows, or memory (Phase 3), always read:
${CLAUDE_SKILL_DIR}/references/gotchas.md
</required_reading>
Core Scaffolding Rules
These rules apply in every scaffold mode unless the existing project already uses a different deliberate pattern:
- Tools over knowledge — external facts, mutations, and retrieval belong in tools. Do not scaffold agents that "just know" business state from prompts.
- Narrow ownership — each agent should own one decision boundary or domain, not a whole product surface.
- Bounded loops — every agent scaffold must include an explicit step budget, stop condition, and failure path. Never imply open-ended looping.
- Workflow before agent — if the sequence is predictable, resumable, auditable, or mostly deterministic, scaffold a workflow first and use agents only inside the judgment-heavy steps.
- Prompt as configuration — prompts should encode role, routing, formatting, and hard constraints. Do not branch application logic on prompt variants when server-side configuration can decide it deterministically.
- Memory is earned — do not add memory by default. Add it only when the task needs cross-turn recall, entity state, or retrieval over durable data.
- Eval and fallback paths are part of the scaffold — generate the core success criteria, likely failure cases, and the fallback behavior together.
Mode: init
Initialize Mastra agent infrastructure in the project.
-
Install packages (suggest, don't run without confirmation):
npm install @mastra/core zod # Optional: npm install @mastra/memory @mastra/pg # For memory -
Create directory structure following the project's convention:
For monorepo (
packages/agents/):packages/agents/src/ ├── mastra.ts # Central Mastra instance ├── agents/ │ └── index.ts ├── tools/ │ └── index.ts ├── workflows/ │ └── index.ts ├── triggers.ts # Workflow trigger functions └── index.ts # Package exportsFor single app (
src/mastra/):src/mastra/ ├── index.ts # Central Mastra instance ├── agents/ │ └── index.ts ├── tools/ │ └── index.ts └── workflows/ └── index.ts -
Generate
mastra.ts— central registry:import { Mastra } from "@mastra/core/mastra"; // Import agents as they're created // import { myAgent } from "./agents"; export const mastra = new Mastra({ agents: { // Register agents here }, workflows: { // Register workflows here }, }); -
Ask what to create first — agent, tool, or workflow.
Mode: agent <name>
Scaffold a new agent. Ask:
- What decision boundary does this agent own? (what it should handle vs. hand off)
- What tools does it need? (list existing tools or describe new ones; keep the set narrow)
- How should it stop or fail? (step budget, success condition, fallback or escalation path)
- How complex are its instructions? (inline string, prompt builder, or markdown files)
- What model? (default: recommend based on task complexity and latency target)
Then generate:
- Agent definition file:
agents/<name>.ts - Instructions (inline, prompt builder, or markdown directory)
- Export from
agents/index.ts - Register in
mastra.ts - Stub any new tools mentioned
- A clear step budget / stop condition in the generated config
- A short list of eval scenarios or fixtures to add next if the project has a test surface
Follow the agent definition pattern from conventions reference.
Mode: tool <name>
Scaffold a new tool. Ask:
- What does this tool do? (becomes the description — write it as agent onboarding)
- What are the inputs? (generate Zod schema)
- What does it return? (generate output schema)
- Does it read or write? (sets MCP annotations)
- Does it need user context? (triggers security pattern)
Then generate:
- Tool definition file:
tools/<name>.tsortools/<domain>/<name>.ts - Zod input and output schemas
- MCP annotations (readOnlyHint, destructiveHint, idempotentHint)
- Security pattern if user context needed (RequestContext injection)
- Export from
tools/index.ts
Follow the security-first tool pattern from patterns reference.
Mode: workflow <name>
Scaffold a workflow. Ask:
- What does this workflow accomplish? (end-to-end description)
- Which steps are deterministic vs. agentic? (keep agents only in judgment-heavy steps)
- What are the steps? (sequential? parallel? fan-out/fan-in?)
- What shared state is needed between steps?
- How is it triggered? (API route? event? cron?)
- What retry, suspend, or human-review points are required?
Then generate:
- Workflow directory:
workflows/<name>/index.ts— workflow definition with step compositionsteps/— individual step filesstate.ts— Zod schema for shared workflow state (if needed)
- Trigger function in
triggers.ts - Register in
mastra.ts
Follow the workflow patterns from conventions reference.
Mode: memory
Add memory to the project. Generate:
- Memory configuration with Mastra Memory + PgVector by default
- Embedder selection guidance (Voyage AI, OpenAI, Google, Workers AI)
- Cloudflare-native alternatives when the app is Workers-native
- Usage pattern — how to pass memory to agent calls
- Environment variables and bindings needed
Always explain why memory is needed for this use case. If the task can be handled with request-scoped state, retrieved context, or workflow state, prefer those before durable conversational memory.
Mode: model
<required_reading>
${CLAUDE_SKILL_DIR}/references/model-routing.md
</required_reading>
Set up multi-provider model routing. Generate:
agents/model.tswith environment-based provider switching- Support for: Google AI, OpenRouter, Anthropic, OpenAI, Cloudflare AI Gateway, Workers AI
- Fallback chain configuration
- Environment variable and Workers binding documentation
Mode: browser
Add browser/web access tooling for agents. Generate:
- Provider recommendation (Browser Run, Browserbase/Stagehand, Playwright)
- Tool wrapper with allowlists, truncation, audit logging, and timeout limits
- Safety contract — browser tools are never hidden utility calls
Mode: sandbox
Add isolated code execution tooling for agents. Generate:
- Provider recommendation (Cloudflare Sandbox SDK, Vercel Sandbox, Daytona, Modal)
- Tool wrapper with language allowlist, resource limits, file system boundaries
- Safety contract — never execute code in the app server process
Phase 3: Wiring
After generating any scaffolding:
- Register — Add to
mastra.tscentral instance - Export — Add to barrel
index.tsfiles - Wire triggers — If workflow, create trigger function
- Env vars — List any new environment variables needed
- Verify — Run
npx tsc --noEmitto check types (suggest, don't run without asking) - Check failure shape — confirm step limits, retries, and fallback behavior are visible in code
- Note eval hooks — point to the first 3-5 scenarios that should become tests or eval fixtures
Interaction Style
This skill is conversational, not automated. For each mode:
- Present what you detected in Phase 0
- Ask clarifying questions (keep them focused — 2-3 max per round)
- Show a preview of what you'll generate (file list + key decisions)
- Generate on confirmation
- Offer the next logical step
Do not dump a wall of code. Generate file by file, explaining key decisions. Prefer generating working code over placeholder comments.
Complementary Skills
| Need | Use |
|---|---|
| Mastra API docs, type signatures | /mastra skill |
| AI SDK patterns | /arc:ai skill |
Reference Loading
Audit dimension references are loaded by scoring agents — do NOT load them in this context. The scoring agents each load their own dimension reference file independently.
Scaffold Reference Loading
| File | Load When |
|---|---|
references/conventions.md |
Before generating any code |
references/patterns.md |
Generating agents, tools, or complex workflows |
references/model-routing.md |
Setting up model or provider configuration |
references/workflow-composition.md |
Workflows with branches, loops, parallel, or suspend |
references/gotchas.md |
Wiring phase — connecting agents, workflows, memory |
Edge Cases
Audit
- Polyglot projects: Score each language surface independently, report weakest
- Monorepos: Offer per-package or aggregate scoring
- Zero-score projects: Focus on the 3 highest-impact dimensions only
- Re-audits: Load previous
docs/surface/scorecard.md, show delta - N/A dimensions: Skip and scale total proportionally
- Single dimension mode:
--dimension=Xskips other dimensions entirely
Scaffold
- No Mastra installed, user wants it: Walk through init first
- Non-Mastra project: Adapt patterns to AI SDK or MCP conventions
- Existing agents directory: Detect naming conventions and follow them
- Monorepo with multiple agent packages: Ask which package to target
- TypeScript not configured for ES2022: Warn about module requirements
- Cloudflare Workers project: Do not generate Node-only APIs. Use Workers bindings, Durable Objects, Queues, Browser Run, Sandbox, AI Gateway, Workers AI, and runtime-compatible packages.
- Browser or sandbox tools: Treat as high-risk capabilities. Add allowlists, quotas, audit logs, timeout limits, and confirmation gates for writes.