agent-workflow
MANDATORY — Context Gathering Protocol
Before applying any workflow guidance, gather context:
-
Check for Maestro context in the project root
- First check
.maestro/context.md(v2 layout) - Then check
.maestro.md(v1 layout — backward compatible) - If it exists → read it and use the workflow context within
- If it doesn't exist → tell the user: "No workflow context found. Run /teach-maestro to set up project-specific context for better results."
- First check
-
Check for decision history (optional)
- If
.maestro/decisions.jsonlexists → read the last 5 decisions for session continuity - If it doesn't exist → proceed without it (no error)
- If
-
Minimum viable context (if no
.maestro.md):- What AI model(s) are being used?
- What is the workflow's primary task?
- Are there existing prompts, tools, or agents to work with?
- What are the quality/speed/cost priorities?
-
DO NOT proceed without at least understanding the model, task, and priorities.
Maestro — AI Agent Workflow Mastery
This skill provides the foundational knowledge for designing, building, and maintaining production-grade AI agent workflows. All Maestro commands build on these principles.
Core Principles
- Structure over improvisation — Workflows should be deliberate, not emergent
- Constraints are features — Explicit boundaries prevent failure modes
- Measure, don't assume — Every workflow needs evaluation, not just testing
- Appropriate complexity — Match the solution to the problem, not the ambition
- Graceful degradation — Every component should fail safely
1. Prompt Engineering
DO:
- Use structured prompts with clear sections (role, context, instructions, output format)
- Define output schemas explicitly (JSON schema, markdown template, typed response)
- Use few-shot examples for ambiguous tasks
- Chain-of-thought for multi-step reasoning
- Keep system prompts focused — one clear role per prompt
DON'T:
- Write wall-of-text prompts with no structure
- Assume the model understands implicit output format
- Use the same prompt for fundamentally different tasks
- Put conflicting instructions in the same prompt
- Rely on the model to "figure it out"
→ Consult prompt engineering reference for structure, patterns, and output schemas.
2. Context Management
DO:
- Budget context window usage (system prompt, examples, user input, tool results, output)
- Place critical information at the start AND end of context (attention gradient)
- Use retrieval (RAG) instead of stuffing full documents
- Maintain conversation state explicitly
- Summarize long histories instead of passing raw transcripts
DON'T:
- Dump entire codebases, databases, or documents into context
- Ignore context window limits until you hit them
- Assume the model pays equal attention to all context
- Pass irrelevant information "just in case"
- Rely on implicit memory across turns
→ Consult context management reference for window optimization and memory patterns.
3. Tool Orchestration
DO:
- Give tools clear, specific names and descriptions
- Define input/output schemas for every tool
- Handle tool errors gracefully (the tool WILL fail eventually)
- Keep tool sets focused — 3-7 tools per agent is ideal
- Make tools idempotent where possible
DON'T:
- Expose 30+ tools and hope the model picks the right one
- Use vague tool descriptions ("does stuff with data")
- Skip error handling in tool implementations
- Let tools have side effects without confirmation for destructive operations
- Create tools that overlap in functionality
→ Consult tool orchestration reference for selection heuristics and composition patterns.
4. Agent Architecture
DO:
- Start with a single agent — add agents only when a single agent demonstrably fails
- Define clear boundaries and responsibilities for each agent
- Use structured handoff protocols between agents
- Implement supervisor patterns for multi-agent systems
- Design for observability — log agent decisions, not just outputs
DON'T:
- Build multi-agent systems for problems a single agent handles
- Create agents without clear boundaries (overlapping responsibilities = conflicts)
- Use unstructured communication between agents
- Skip the supervisor — autonomous agent swarms are unpredictable
- Assume agents will coordinate without explicit protocols
→ Consult agent architecture reference for topology patterns and delegation.
5. Feedback Loops
DO:
- Build evaluation into the workflow from day one
- Create golden test sets with known-good inputs and outputs
- Use automated evaluators for consistent quality scoring
- Track regression — compare new outputs against baselines
- Implement self-correction loops for critical outputs
DON'T:
- Ship without evaluation ("it seems to work" is not evaluation)
- Rely solely on human review at scale
- Use the same model to evaluate its own output without structure
- Skip regression testing when changing prompts or models
- Conflate "the model ran without errors" with "the output is correct"
→ Consult feedback loops reference for evaluation patterns and self-correction.
6. Knowledge Systems
DO:
- Choose retrieval strategy based on query type (semantic, keyword, hybrid)
- Chunk documents thoughtfully (semantic boundaries, not arbitrary token counts)
- Include source attribution in every retrieved result
- Test retrieval quality independently of generation quality
- Version your knowledge base — know what the model has access to
DON'T:
- Build RAG without testing retrieval quality first
- Use fixed chunk sizes for all document types
- Skip source attribution (hallucination without attribution is undetectable)
- Index everything without curation (garbage in = garbage out)
- Assume embedding similarity equals relevance
→ Consult knowledge systems reference for RAG, embeddings, and grounding.
7. Guardrails & Safety
DO:
- Validate inputs before processing (schema validation, size limits)
- Filter outputs for sensitive content, PII, and policy violations
- Set hard cost ceilings (max tokens, max API calls, max spend per run)
- Implement circuit breakers for cascading failures
- Log everything for audit trails
DON'T:
- Deploy without input validation (prompt injection is real)
- Trust model output without verification for high-stakes decisions
- Run without cost controls (one runaway loop can cost thousands)
- Skip rate limiting on external API calls
- Assume the model will follow safety instructions 100% of the time
→ Consult guardrails reference for validation, sandboxing, and constraints.
The Workflow Slop Test
If any of these are true, the workflow needs work:
- Prompts are unstructured walls of text → run
/refine - No output schema defined — model decides the format → run
/refine - Context window used without budget — everything stuffed in → run
/accelerate - More than 10 tools exposed to a single agent → run
/streamline - No error handling — happy path only → run
/fortify - No evaluation — "it seems to work" → run
/iterate - Multi-agent system for a single-agent problem → run
/temper - No cost controls — unbounded token usage → run
/guard - Tools have vague one-line descriptions → run
/calibrate - No logging — can't debug production issues → run
/fortify
Zero checked = production-ready. 3+ checked = workflow slop.
Available Commands
Use these commands to apply specific aspects of workflow mastery:
{{available_commands}}
More from sharpdeveye/maestro
diagnose
Use when the user wants to find problems, audit workflow quality, or get a comprehensive health check on their AI workflow.
131evaluate
Use when the user wants a quality review, interaction audit, or to test the workflow against realistic scenarios.
130calibrate
Use when workflow components are inconsistent, naming conventions vary, or a new team member's work needs alignment to project standards.
125fortify
Use when the workflow lacks error handling, has been failing in production, or needs retry logic, fallback strategies, and circuit breakers.
125streamline
Use when the workflow feels too complex, has accumulated cruft, or has redundant steps and overlapping tools that need consolidation.
125teach-maestro
Use when starting a new project with Maestro or when no .maestro.md context file exists yet. Run once per project.
125