MANDATORY — Context Gathering Protocol

Before applying any workflow guidance, gather context:

Check for Maestro context in the project root
- First check .maestro/context.md (v2 layout)
- Then check .maestro.md (v1 layout — backward compatible)
- If it exists → read it and use the workflow context within
- If it doesn't exist → tell the user: "No workflow context found. Run /teach-maestro to set up project-specific context for better results."
Check for decision history (optional)
- If .maestro/decisions.jsonl exists → read the last 5 decisions for session continuity
- If it doesn't exist → proceed without it (no error)
Minimum viable context (if no .maestro.md):
- What AI model(s) are being used?
- What is the workflow's primary task?
- Are there existing prompts, tools, or agents to work with?
- What are the quality/speed/cost priorities?
DO NOT proceed without at least understanding the model, task, and priorities.

Maestro — AI Agent Workflow Mastery

This skill provides the foundational knowledge for designing, building, and maintaining production-grade AI agent workflows. All Maestro commands build on these principles.

Core Principles

Structure over improvisation — Workflows should be deliberate, not emergent
Constraints are features — Explicit boundaries prevent failure modes
Measure, don't assume — Every workflow needs evaluation, not just testing
Appropriate complexity — Match the solution to the problem, not the ambition
Graceful degradation — Every component should fail safely

1. Prompt Engineering

DO:

Use structured prompts with clear sections (role, context, instructions, output format)
Define output schemas explicitly (JSON schema, markdown template, typed response)
Use few-shot examples for ambiguous tasks
Chain-of-thought for multi-step reasoning
Keep system prompts focused — one clear role per prompt

DON'T:

Write wall-of-text prompts with no structure
Assume the model understands implicit output format
Use the same prompt for fundamentally different tasks
Put conflicting instructions in the same prompt
Rely on the model to "figure it out"

→ Consult prompt engineering reference for structure, patterns, and output schemas.

2. Context Management

DO:

Budget context window usage (system prompt, examples, user input, tool results, output)
Place critical information at the start AND end of context (attention gradient)
Use retrieval (RAG) instead of stuffing full documents
Maintain conversation state explicitly
Summarize long histories instead of passing raw transcripts

DON'T:

Dump entire codebases, databases, or documents into context
Ignore context window limits until you hit them
Assume the model pays equal attention to all context
Pass irrelevant information "just in case"
Rely on implicit memory across turns

→ Consult context management reference for window optimization and memory patterns.

3. Tool Orchestration

DO:

Give tools clear, specific names and descriptions
Define input/output schemas for every tool
Handle tool errors gracefully (the tool WILL fail eventually)
Keep tool sets focused — 3-7 tools per agent is ideal
Make tools idempotent where possible

DON'T:

Expose 30+ tools and hope the model picks the right one
Use vague tool descriptions ("does stuff with data")
Skip error handling in tool implementations
Let tools have side effects without confirmation for destructive operations
Create tools that overlap in functionality

→ Consult tool orchestration reference for selection heuristics and composition patterns.

4. Agent Architecture

DO:

Start with a single agent — add agents only when a single agent demonstrably fails
Define clear boundaries and responsibilities for each agent
Use structured handoff protocols between agents
Implement supervisor patterns for multi-agent systems
Design for observability — log agent decisions, not just outputs

DON'T:

Build multi-agent systems for problems a single agent handles
Create agents without clear boundaries (overlapping responsibilities = conflicts)
Use unstructured communication between agents
Skip the supervisor — autonomous agent swarms are unpredictable
Assume agents will coordinate without explicit protocols

→ Consult agent architecture reference for topology patterns and delegation.

5. Feedback Loops

DO:

Build evaluation into the workflow from day one
Create golden test sets with known-good inputs and outputs
Use automated evaluators for consistent quality scoring
Track regression — compare new outputs against baselines
Implement self-correction loops for critical outputs

DON'T:

Ship without evaluation ("it seems to work" is not evaluation)
Rely solely on human review at scale
Use the same model to evaluate its own output without structure
Skip regression testing when changing prompts or models
Conflate "the model ran without errors" with "the output is correct"

→ Consult feedback loops reference for evaluation patterns and self-correction.

6. Knowledge Systems

DO:

Choose retrieval strategy based on query type (semantic, keyword, hybrid)
Chunk documents thoughtfully (semantic boundaries, not arbitrary token counts)
Include source attribution in every retrieved result
Test retrieval quality independently of generation quality
Version your knowledge base — know what the model has access to

DON'T:

Build RAG without testing retrieval quality first
Use fixed chunk sizes for all document types
Skip source attribution (hallucination without attribution is undetectable)
Index everything without curation (garbage in = garbage out)
Assume embedding similarity equals relevance

→ Consult knowledge systems reference for RAG, embeddings, and grounding.

7. Guardrails & Safety

DO:

Validate inputs before processing (schema validation, size limits)
Filter outputs for sensitive content, PII, and policy violations
Set hard cost ceilings (max tokens, max API calls, max spend per run)
Implement circuit breakers for cascading failures
Log everything for audit trails

DON'T:

Deploy without input validation (prompt injection is real)
Trust model output without verification for high-stakes decisions
Run without cost controls (one runaway loop can cost thousands)
Skip rate limiting on external API calls
Assume the model will follow safety instructions 100% of the time

→ Consult guardrails reference for validation, sandboxing, and constraints.

The Workflow Slop Test

If any of these are true, the workflow needs work:

Zero checked = production-ready. 3+ checked = workflow slop.

Available Commands

Use these commands to apply specific aspects of workflow mastery:

agent-workflow

MANDATORY — Context Gathering Protocol

Maestro — AI Agent Workflow Mastery

Core Principles

1. Prompt Engineering

2. Context Management

3. Tool Orchestration

4. Agent Architecture

5. Feedback Loops

6. Knowledge Systems

7. Guardrails & Safety

The Workflow Slop Test

Available Commands

More from sharpdeveye/maestro

diagnose

evaluate

calibrate

fortify

streamline

teach-maestro