ai-prompt-engineering
Prompt Engineering — Operational Skill
Modern Best Practices (January 2026): versioned prompts, explicit output contracts, regression tests, and safety threat modeling for tool/RAG prompts (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
This skill provides operational guidance for building production-ready prompts across standard tasks, RAG workflows, agent orchestration, structured outputs, hidden reasoning, and multi-step planning.
All content is operational, not theoretical. Focus on patterns, checklists, and copy-paste templates.
Quick Start (60 seconds)
- Pick a pattern from the decision tree (structured output, extractor, RAG, tools/agent, rewrite, classification).
- Start from a template in
assets/and fill inTASK,INPUT,RULES, andOUTPUT FORMAT. - Add guardrails: instruction/data separation, “no invented details”, missing →
null/explicit missing. - Add validation: JSON parse check, schema check, citations check, post-tool checks.
- Add evals: 10–20 cases while iterating, 50–200 before release, plus adversarial injection cases.
Model Notes (2026)
This skill includes Claude Code + Codex CLI optimizations:
- Action directives: Frame for implementation, not suggestions
- Parallel tool execution: Independent tool calls can run simultaneously
- Long-horizon task management: State tracking, incremental progress, context compaction resilience
- Positive framing: Describe desired behavior rather than prohibitions
- Style matching: Prompt formatting influences output style
- Domain-specific patterns: Specialized guidance for frontend, research, and agentic coding
- Style-adversarial resilience: Stress-test refusals with poetic/role-play rewrites; normalize or decline stylized harmful asks before tool use
Prefer “brief justification” over requesting chain-of-thought. When using private reasoning patterns, instruct: think internally; output only the final answer.
Quick Reference
| Task | Pattern to Use | Key Components | When to Use |
|---|---|---|---|
| Machine-parseable output | Structured Output | JSON schema, "JSON-only" directive, no prose | API integrations, data extraction |
| Field extraction | Deterministic Extractor | Exact schema, missing->null, no transformations | Form data, invoice parsing |
| Use retrieved context | RAG Workflow | Context relevance check, chunk citations, explicit missing info | Knowledge bases, documentation search |
| Internal reasoning | Hidden Chain-of-Thought | Internal reasoning, final answer only | Classification, complex decisions |
| Tool-using agent | Tool/Agent Planner | Plan-then-act, one tool per turn | Multi-step workflows, API calls |
| Text transformation | Rewrite + Constrain | Style rules, meaning preservation, format spec | Content adaptation, summarization |
| Classification | Decision Tree | Ordered branches, mutually exclusive, JSON result | Routing, categorization, triage |
Decision Tree: Choosing the Right Pattern
User needs: [Prompt Type]
|-- Output must be machine-readable?
| |-- Extract specific fields only? -> **Deterministic Extractor Pattern**
| `-- Generate structured data? -> **Structured Output Pattern (JSON)**
|
|-- Use external knowledge?
| `-- Retrieved context must be cited? -> **RAG Workflow Pattern**
|
|-- Requires reasoning but hide process?
| `-- Classification or decision task? -> **Hidden Chain-of-Thought Pattern**
|
|-- Needs to call external tools/APIs?
| `-- Multi-step workflow? -> **Tool/Agent Planner Pattern**
|
|-- Transform existing text?
| `-- Style/format constraints? -> **Rewrite + Constrain Pattern**
|
`-- Classify or route to categories?
`-- Mutually exclusive rules? -> **Decision Tree Pattern**
Copy/Paste: Minimal Prompt Skeletons
1) Generic "output contract" skeleton
TASK:
{{one_sentence_task}}
INPUT:
{{input_data}}
RULES:
- Follow TASK exactly.
- Use only INPUT (and tool outputs if tools are allowed).
- No invented details. Missing required info -> say what is missing.
- Keep reasoning hidden.
- Follow OUTPUT FORMAT exactly.
OUTPUT FORMAT:
{{schema_or_format_spec}}
2) Tool/agent skeleton (deterministic)
AVAILABLE TOOLS:
{{tool_signatures_or_names}}
WORKFLOW:
- Make a short plan.
- Call tools only when required to complete the task.
- Validate tool outputs before using them.
- If the environment supports parallel tool calls, run independent calls in parallel.
3) RAG skeleton (grounded)
RETRIEVED CONTEXT:
{{chunks_with_ids}}
RULES:
- Use only retrieved context for factual claims.
- Cite chunk ids for each claim.
- If evidence is missing, say what is missing.
Operational Checklists
Use these references when validating or debugging prompts:
frameworks/shared-skills/skills/ai-prompt-engineering/references/quality-checklists.mdframeworks/shared-skills/skills/ai-prompt-engineering/references/production-guidelines.md
Context Engineering (2026)
True expertise in prompting extends beyond writing instructions to shaping the entire context in which the model operates. Context engineering encompasses:
- Conversation history: What prior turns inform the current response
- Retrieved context (RAG): External knowledge injected into the prompt
- Structured inputs: JSON schemas, system/user message separation
- Tool outputs: Results from previous tool calls that shape next steps
Context Engineering vs Prompt Engineering
| Aspect | Prompt Engineering | Context Engineering |
|---|---|---|
| Focus | Instruction text | Full input pipeline |
| Scope | Single prompt | RAG + history + tools |
| Optimization | Word choice, structure | Information architecture |
| Goal | Clear instructions | Optimal context window |
Key Context Engineering Patterns
1. Context Prioritization: Place most relevant information first; models attend more strongly to early context.
2. Context Compression: Summarize history, truncate tool outputs, select most relevant RAG chunks.
3. Context Separation: Use clear delimiters (<system>, <user>, <context>) to separate instruction types.
4. Dynamic Context: Adjust context based on task complexity - simple tasks need less context, complex tasks need more.
Core Concepts vs Implementation Practices
Core Concepts (Vendor-Agnostic)
- Prompt contract: inputs, allowed tools, output schema, max tokens, and refusal rules.
- Context engineering: conversation history, RAG context, tool outputs, and structured inputs shape model behavior.
- Determinism controls: temperature/top_p, constrained decoding/structured outputs, and strict formatting.
- Cost & latency budgets: prompt length and max output drive tokens and tail latency; enforce hard limits and measure p95/p99.
- Evaluation: golden sets + regression gates + A/B + post-deploy monitoring.
- Security: prompt injection, data exfiltration, and tool misuse are primary threats (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Implementation Practices (Model/Platform-Specific)
- Use model-specific structured output features when available; keep a schema validator as the source of truth.
- Align tracing/metrics with OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Do / Avoid
Do
- Do keep prompts small and modular; centralize shared fragments (policies, schemas, style).
- Do add a prompt eval harness and block merges on regressions.
- Do prefer "brief justification" over requesting chain-of-thought; treat hidden reasoning as model-internal.
Avoid
- Avoid prompt sprawl (many near-duplicates with no owner or tests).
- Avoid brittle multi-step chains without intermediate validation.
- Avoid mixing policy and product copy in the same prompt (harder to audit and update).
Navigation: Core Patterns
- Core Patterns - 7 production-grade prompt patterns
- Structured Output (JSON), Deterministic Extractor, RAG Workflow
- Hidden Chain-of-Thought, Tool/Agent Planner, Rewrite + Constrain, Decision Tree
- Each pattern includes structure template and validation checklist
Navigation: Best Practices
-
Best Practices (Core) - Foundation rules for production-grade prompts
- System instruction design, output contract specification, action directives
- Context handling, error recovery, positive framing, style matching, style-adversarial red teaming
- Anti-patterns, Claude 4+ specific optimizations
-
Production Guidelines - Deployment and operational guidance
- Evaluation & testing (Prompt CI/CD), model parameters, few-shot selection
- Safety & guardrails, conversation memory, context compaction resilience
- Answer engineering, decomposition, multilingual/multimodal, benchmarking
- CI/CD Tools (2026): Promptfoo, DeepEval integration patterns
- Security (2026): PromptGuard 4-layer defense, Microsoft Prompt Shields, taint tracking
-
Quality Checklists - Validation checklists before deployment
- Prompt QA, JSON validation, agent workflow checks
- RAG workflow, safety & security, performance optimization
- Testing coverage, anti-patterns, quality score rubric
-
Domain-Specific Patterns - Claude 4+ optimized patterns for specialized domains
- Frontend/visual code: Creativity encouragement, design variations, micro-interactions
- Research tasks: Success criteria, verification, hypothesis tracking
- Agentic coding: No speculation rule, principled implementation, investigation patterns
- Cross-domain best practices and quality modifiers
Navigation: Specialized Patterns
-
RAG Patterns - Retrieval-augmented generation workflows
- Context grounding, chunk citation, missing information handling
-
Agent and Tool Patterns - Tool use and agent orchestration
- Plan-then-act workflows, tool calling, multi-step reasoning, generate-verify-revise chains
- Multi-Agent Orchestration (2026): centralized, handoff, federated patterns; plan-and-execute (90% cost reduction)
-
Extraction Patterns - Deterministic field extraction
- Schema-based extraction, null handling, no hallucinations
-
Reasoning Patterns (Hidden CoT) - Internal reasoning without visible output
- Hidden reasoning, final answer only, classification workflows
- Extended Thinking API (Claude 4+): budget management, think tool, multishot patterns
-
Additional Patterns - Extended prompt engineering techniques
- Advanced patterns, edge cases, optimization strategies
-
Prompt Testing & CI/CD - Automated prompt evaluation pipelines
- Promptfoo, DeepEval integration, regression detection, A/B testing, quality gates
-
Multimodal Prompt Patterns - Vision, audio, and document input patterns
- Image description, OCR+LLM, bounding box prompts, Whisper conditioning, video frame analysis
-
Prompt Security & Defense - Securing LLM applications against adversarial attacks
- Injection detection (PromptGuard, Prompt Shields), defense-in-depth, taint tracking, red team testing
Navigation: Templates
Templates are copy-paste ready and organized by complexity:
Quick Templates
- Quick Template - Fast, minimal prompt structure
Standard Templates
- Standard Template - Production-grade operational prompt
- Agent Template - Tool-using agent with planning
- RAG Template - Retrieval-augmented generation
- Chain-of-Thought Template - Hidden reasoning pattern
- JSON Extractor Template - Deterministic field extraction
- Prompt Evaluation Template - Regression tests, A/B testing, rollout gates
External Resources
External references are listed in data/sources.json:
- Official documentation (OpenAI, Anthropic, Google)
- LLM frameworks (LangChain, LlamaIndex)
- Vector databases (Pinecone, Weaviate, FAISS)
- Evaluation tools (OpenAI Evals, HELM)
- Safety guides and standards
- RAG and retrieval resources
Freshness Rule (2026)
When asked for “latest” prompting recommendations, prefer provider docs and standards from data/sources.json. If web search is unavailable, state the constraint and avoid overconfident “current best” claims.
Related Skills
This skill provides foundational prompt engineering patterns. For specialized implementations:
AI/LLM Skills:
- AI Agents Development - Production agent patterns, MCP integration, orchestration
- AI LLM Engineering - LLM application architecture and deployment
- AI LLM RAG Engineering - Advanced RAG pipelines and chunking strategies
- AI LLM Search & Retrieval - Search optimization, hybrid retrieval, reranking
- AI LLM Development - Fine-tuning, evaluation, dataset creation
Software Development Skills:
- Software Architecture Design - System design patterns
- Software Backend - Backend implementation
- Foundation API Design - API design and contracts
Usage Notes
For Claude Code:
- Reference this skill when building prompts for agents, commands, or integrations
- Use Quick Reference table for fast pattern lookup
- Follow Decision Tree to select appropriate pattern
- Validate outputs with Quality Checklists before deployment
- Use templates as starting points, customize for specific use cases
For Codex CLI:
- Use the same patterns and templates; adapt tool-use wording to the local tool interface
- For long-horizon tasks, track progress explicitly (a step list/plan) and update it as work completes
- Run independent reads/searches in parallel when the environment supports it; keep writes/edits serialized
- AGENTS.md Integration: Place project-specific prompt guidance in AGENTS.md files at global (~/.codex/AGENTS.md), project-level (./AGENTS.md), or subdirectory scope for layered instructions
- Reasoning Effort: Use
mediumfor interactive coding (default),high/xhighfor complex autonomous multi-hour tasks
Fact-Checking
- Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
- Prefer primary sources; report source links and dates for volatile information.
- If web access is unavailable, state the limitation and mark guidance as unverified.