prompt-engeneering
Prompt Engineering
Universal techniques for crafting effective prompts across any LLM.
Core Principles
1. Structure with XML Tags
Use XML tags to create clear, parseable prompts:
<context>Background information here</context>
<instructions>
1. First step
2. Second step
</instructions>
<examples>Sample inputs/outputs</examples>
<output_format>Expected structure</output_format>
Benefits:
- Clarity: Separates context, instructions, and examples
- Accuracy: Prevents model from mixing up sections
- Flexibility: Easy to modify individual parts
- Parseability: Enables structured output extraction
Best practices:
- Use consistent tag names throughout (
<instructions>, not sometimes<steps>) - Reference tags explicitly: "Using the data in
<context>tags..." - Nest tags for hierarchy:
<examples><example id="1">...</example></examples> - Combine with other techniques:
<thinking>for chain-of-thought,<answer>for final output
2. Control Output Shape
Specify explicit constraints on length, format, and structure:
<output_spec>
- Default: 3-6 sentences or ≤5 bullets
- Simple yes/no questions: ≤2 sentences
- Complex multi-step tasks:
- 1 short overview paragraph
- ≤5 bullets: What changed, Where, Risks, Next steps, Open questions
- Use Markdown with headers, bullets, tables when helpful
- Avoid long narrative paragraphs; prefer compact structure
</output_spec>
3. Prevent Scope Drift
Explicitly constrain what the model should NOT do:
<constraints>
- Implement EXACTLY and ONLY what is requested
- No extra features, components, or embellishments
- If ambiguous, choose the simplest valid interpretation
- Do NOT invent values, make assumptions, or add unrequested elements
</constraints>
4. Handle Ambiguity Explicitly
Prevent hallucinations and overconfidence:
<uncertainty_handling>
- If the question is ambiguous:
- Ask 1-3 precise clarifying questions, OR
- Present 2-3 plausible interpretations with labeled assumptions
- When facts may have changed: answer in general terms, state uncertainty
- Never fabricate exact figures or references when uncertain
- Prefer "Based on the provided context..." over absolute claims
</uncertainty_handling>
5. Long-Context Grounding
For inputs >10k tokens, add re-grounding instructions:
<long_context_handling>
- First, produce a short internal outline of key sections relevant to the request
- Re-state user constraints explicitly before answering
- Anchor claims to sections ("In the 'Data Retention' section...")
- Quote or paraphrase fine details (dates, thresholds, clauses)
</long_context_handling>
Agentic Prompts
Tool Usage Rules
<tool_usage>
- Prefer tools over internal knowledge for:
- Fresh or user-specific data (tickets, orders, configs)
- Specific IDs, URLs, or document references
- Parallelize independent reads when possible
- After write operations, restate: what changed, where, any validation performed
</tool_usage>
User Updates
<user_updates>
- Send brief updates (1-2 sentences) only when:
- Starting a new major phase
- Discovering something that changes the plan
- Avoid narrating routine operations
- Each update must include a concrete outcome ("Found X", "Updated Y")
- Do not expand scope beyond what was asked
</user_updates>
Self-Check for High-Risk Outputs
<self_check>
Before finalizing answers in sensitive contexts (legal, financial, safety):
- Re-scan for unstated assumptions
- Check for ungrounded numbers or claims
- Soften overly strong language ("always", "guaranteed")
- Explicitly state assumptions
</self_check>
Structured Extraction
For data extraction tasks, always provide a schema:
<extraction_spec>
Extract data into this exact schema (no extra fields):
{
"field_name": "string",
"optional_field": "string | null",
"numeric_field": "number | null"
}
- If a field is not present in source, set to null (don't guess)
- Re-scan source for missed fields before returning
</extraction_spec>
Web Research Prompts
<research_guidelines>
- Browse the web for: time-sensitive topics, recommendations, navigational queries, ambiguous terms
- Include citations after paragraphs with web-derived claims
- Use multiple sources for key claims; prioritize primary sources
- Research until additional searching won't materially change the answer
- Structure output with Markdown: headers, bullets, tables for comparisons
</research_guidelines>
Example: Before/After
Without structure:
You're a financial analyst. Generate a Q2 report for investors. Include Revenue, Margins, Cash Flow. Use this data: {{DATA}}. Make it professional and concise.
With structure:
You're a financial analyst at AcmeCorp generating a Q2 report for investors.
<context>
AcmeCorp is a B2B SaaS company. Investors value transparency and actionable insights.
</context>
<data>
{{DATA}}
</data>
<instructions>
1. Include sections: Revenue Growth, Profit Margins, Cash Flow
2. Highlight strengths and areas for improvement
3. Use concise, professional tone
</instructions>
<output_format>
- Use bullet points with metrics and YoY changes
- Include "Action:" items for areas needing improvement
- End with 2-3 bullet Outlook section
</output_format>
Prompt Migration Checklist
When adapting prompts across models or versions:
- Switch model, keep prompt identical — isolate the variable
- Pin reasoning/thinking depth to match prior model's profile
- Run evals — if results are good, ship
- If regressions, tune prompt — adjust verbosity/format/scope constraints
- Re-eval after each small change — one change at a time
Quick Reference
| Technique | Tag Pattern | Use Case |
|---|---|---|
| Separate sections | <context>, <instructions>, <data> |
Any complex prompt |
| Control length | <output_spec> with word/bullet limits |
Prevent verbosity |
| Prevent drift | <constraints> with explicit "do NOT" |
Feature creep |
| Handle uncertainty | <uncertainty_handling> |
Factual queries |
| Chain of thought | <thinking>, <answer> |
Reasoning tasks |
| Extraction | <schema> with JSON structure |
Data parsing |
| Research | <research_guidelines> |
Web-enabled agents |
| Self-check | <self_check> |
High-risk domains |
| Tool usage | <tool_usage_rules> |
Agentic systems |
| Eagerness control | <persistence>, <context_gathering> |
Agent autonomy |
| Persona | <role> + behavioral constraints |
Tone & style |
Prompting Techniques Catalog
Comprehensive catalog of prompting techniques. Full details, examples, and academic references in references/prompting-techniques.md.
| Technique | Use Case |
|---|---|
| Zero-Shot Prompting | Direct task execution without examples; classification, translation, summarization |
| Few-Shot Prompting | In-context learning via exemplars; format control, label calibration, style matching |
| Chain-of-Thought (CoT) | Step-by-step reasoning; arithmetic, logic, commonsense reasoning tasks |
| Meta Prompting | LLM as orchestrator delegating to specialized expert prompts; complex multi-domain tasks |
| Self-Consistency | Sample multiple CoT paths, pick majority answer; boost accuracy on math & reasoning |
| Generated Knowledge | Generate relevant knowledge first, then answer; commonsense & factual QA |
| Prompt Chaining | Break complex tasks into sequential subtasks; document analysis, multi-step workflows |
| Tree of Thoughts (ToT) | Explore multiple reasoning branches with lookahead/backtracking; planning, puzzles |
| RAG | Retrieve external documents before generating; knowledge-intensive tasks, fresh data |
| ART (Auto Reasoning + Tools) | Auto-select and orchestrate tools with CoT; tasks requiring calculation, search, APIs |
| APE (Auto Prompt Engineer) | LLM generates and scores candidate prompts; prompt optimization at scale |
| Active-Prompt | Identify uncertain examples, annotate selectively for CoT; adaptive few-shot |
| Directional Stimulus | Add a hint/keyword to guide generation direction; summarization, dialogue |
| PAL (Program-Aided LM) | Generate code instead of text for reasoning; math, data manipulation, symbolic tasks |
| ReAct | Interleave reasoning traces with tool actions; search, QA, decision-making agents |
| Reflexion | Agent self-reflects on failures with verbal feedback; iterative improvement, debugging |
| Multimodal CoT | Two-stage: rationale generation then answer with text+image; visual reasoning tasks |
| Graph Prompting | Structured graph-based prompts; node classification, relation extraction, graph tasks |
Prompting Fundamentals
LLM settings, prompt elements, formatting, and practical examples — see references/prompting-introduction.md. Covers:
- LLM Settings — temperature, top-p, max length, stop sequences, frequency/presence penalties
- Prompt Elements — instruction, context, input data, output indicator
- Design Tips — start simple, be specific, avoid impreciseness, say what TO do (not what NOT to do)
- Task Examples — summarization, extraction, QA, classification, conversation, code generation, reasoning
Risks & Misuses
Adversarial attacks, factuality issues, and bias mitigation — see references/prompting-risks.md. Covers:
- Adversarial Prompting — prompt injection, prompt leaking, jailbreaking (DAN, Waluigi Effect), defense tactics
- Factuality — ground truth grounding, calibrated confidence, admit-ignorance patterns
- Biases — exemplar distribution skew, exemplar ordering effects, balanced few-shot design
Prompt Audit / Review
When asked to audit, review, or improve a prompt, follow this workflow. Full checklist with per-check references: prompt-audit-checklist.md.
Workflow
- Read the prompt fully — identify its purpose, target model, and deployment context (interactive chat, agentic system, batch pipeline, RAG-augmented)
- Walk 8 dimensions — check each, note issues with severity (Critical / Warning / Suggestion):
| # | Dimension | What to Check |
|---|---|---|
| 1 | Clarity & Specificity | Task definition, success criteria, audience, output format, conflicting constraints |
| 2 | Structure & Formatting | Section separation (XML tags), prompt smells (monolithic, mixed layers, negative bias) |
| 3 | Safety & Security | Control/data separation, secrets in prompt, injection resilience, tool permissions |
| 4 | Hallucination & Factuality | Role framing, grounding, citation-without-sources, uncertainty handling |
| 5 | Context Management | Info placement (not buried in middle), context size, RAG doc count, re-grounding |
| 6 | Maintainability & Debt | Hardcoded values, regenerated logic, model pinning, testability |
| 7 | Model-Specific Fit | Model-specific params and gotchas (see Model-Specific Guides below) |
| 8 | Evaluation Readiness | Eval criteria, adversarial test cases, schema enforcement, monitoring |
- Produce a report — issues table (dimension, check, severity, issue, fix) + rewritten prompt or targeted fix suggestions. Use the report template from the checklist reference.
- For each issue, cite the relevant reference file so the user can dive deeper.
Quick Decision: Which Dimensions to Prioritize
- User-facing chatbot → prioritize Safety (#3), Hallucination (#4), Clarity (#1)
- Agentic system with tools → prioritize Safety (#3), Context (#5), Maintainability (#6)
- Batch/pipeline → prioritize Structure (#2), Evaluation (#8), Maintainability (#6)
- RAG-augmented → prioritize Context (#5), Safety (#3), Hallucination (#4)
Common Mistakes & Anti-Patterns
Three complementary layers — use the one matching your need:
Deep-dives by category — root causes, mechanisms, prevention checklists (from "The Architecture of Instruction", 2026):
| Mistake Category | Key Issues | Reference |
|---|---|---|
| Hallucinations & Logic | Ambiguity-induced confabulation, automation bias, overloaded prompts, logical failures in verification tasks, no role framing | mistakes-hallucinations.md |
| Structural Fragility | Formatting sensitivity (up to 76pp variance), reproducibility crisis, prompt smells catalog (6 anti-patterns), deliberation ladder | mistakes-structure.md |
| Context Rot | "Lost in the middle" U-shaped attention, RAG over-retrieval, naive data loading, context engineering shift | mistakes-context.md |
| Prompt Debt | Token tax of regenerative code, debt taxonomy (prompt/hyperparameter/framework/cost), multi-agent solutions, automated repair | mistakes-debt.md |
| Security | Direct/indirect injection, jailbreaking, system prompt leakage (OWASP LLM07:2025), RAG poisoning, multimodal injection, adversarial suffixes | mistakes-security.md |
Quick reference — 18-category taxonomy with MRPs, risk scores, case studies, action items: failure-taxonomy.md. Start here for an overview or to prioritize which categories to address first. Covers: control-plane vs data-plane model, heuristic risk scoring, real-world incidents (EchoLeak CVE-2025-32711, Mata v. Avianca, Samsung shadow AI).
How to measure & test — eval metrics, CI gating, red-teaming, tooling: evaluation-redteaming.md. Covers: TruthfulQA, FActScore, SelfCheckGPT, PromptBench, AILuminate, LLM-as-judge pitfalls, guardrail libraries, open research questions.
Model-Specific Guides
Each model family has unique parameters, gotchas, and patterns. Consult the reference for your target model:
- Claude Family — Opus/Sonnet 4.6: adaptive thinking (
effortparam), prefill deprecation (use Structured Outputs), tool overtriggering fix, prompt caching, citations, context engineering, agentic subagent patterns, vision, migration from 4.5 - GPT-5 Family — GPT-5/5.1/5.2:
reasoning_effortparam (defaults vary per version),verbosityAPI control, named tools (apply_patch), agentic eagerness templates, compaction API, instruction conflict sensitivity, migration paths - Gemini 3 Family — Gemini 2.5/3/3.1: temperature MUST be 1.0,
thinking_budgetvsthinking_level, constraint placement (end of prompt), persona priority, function calling, structured output, multimodal, image generation - GPT-5.2 Specifics — Compaction API code examples, web research agent prompt, full XML specification blocks