prompt-engineering
Prompt Engineering Skill
A comprehensive reference for writing production-quality prompts for LLM-based agents and assistants. Based on analysis of system prompts from OpenAI (GPT-5, Codex CLI), Anthropic (Claude Opus/Sonnet, Claude Code v2.0), Google (Gemini, Gemini CLI), xAI (Grok), Mistral (Le Chat), and 10+ agentic tools (Cursor, Windsurf, Devin, Manus, Copilot, Cline, Kiro, Aider, pi-coding-agent, OpenCode).
How to use this skill
This skill is organized into modules. Read the relevant module(s) for your task — don't load everything at once.
Before writing any prompt, read references/01-system-prompts.md. It covers the universal anatomy of a system prompt and patterns that apply regardless of use case.
Then read the module(s) specific to your task:
| Task | Module |
|---|---|
| Writing a system prompt for a chat assistant | references/01-system-prompts.md |
| Designing persistent memory / personalization | references/02-memory-systems.md |
| Defining tools and skills for an agent | references/03-tool-use.md |
| Defining personality, tone, and style | references/04-personality-and-tone.md |
| Reducing hallucination and improving accuracy | references/05-anti-hallucination.md |
| Writing prompts for background/sub-agents | references/06-background-agents.md |
| Writing prompts for coding agents | references/07-coding-agents.md |
| Reusable prompt patterns and anti-patterns | references/08-prompt-patterns.md |
| Full reference prompts from production systems | references/09-reference-prompts.md |
| Designing agentic workflows (persistence, loops, planning, progress, autonomy) | references/10-agentic-workflows.md |
Universal principles
These apply to ALL prompt writing, regardless of module:
-
Examples beat descriptions.
DON'T: "According to my memories..." DO: "Since you're at Kolb Antik..."is more effective than"Use memories naturally without mentioning the system." -
Position matters. Content at the beginning and end of the prompt gets more attention than the middle. Put critical rules at the top.
-
Shorter is better for smaller models. Every unnecessary token dilutes attention from the important parts. A 2000-token prompt that works is better than a 5000-token prompt that's "more complete."
-
Positive framing outperforms negation.
"Write short answers"works better than"Don't write long answers"— models focus on keywords, and with negation those keywords are ironically the undesired behavior. -
Test with the target model. A prompt that works on GPT-4 or Claude Opus may fail completely on a 7B model. Always test with the model that will actually run the prompt.
-
Show, don't tell. Concrete examples, paired good/bad demonstrations, and sample outputs are more effective than abstract descriptions of desired behavior.
-
Structure helps comprehension. Use markdown headers, numbered lists, and clear section boundaries. For Claude specifically, XML tags work exceptionally well. For smaller models, keep structure simple — one level of headers, no deep nesting.
Model-specific notes
- Small models (7B-13B): Need explicit step-by-step instructions, multiple examples per concept, precise output format specs, and shorter overall prompts. Avoid complex conditionals and nested logic.
- Mid-size models (14B-30B): Handle moderate complexity. Can follow structured prompts with 2-3 levels of hierarchy. Benefit from examples but can generalize from fewer of them. Tool calling may be unreliable — test thoroughly.
- Large models (70B+, frontier APIs): Follow nuanced instructions, handle complex conditionals, and generalize well. Can handle longer prompts and more abstract guidance. XML tags, complex tool schemas, and multi-step reasoning work reliably.
Sources and references
This skill synthesizes findings from:
- Leaked system prompts: ChatGPT/GPT-5 (Aug 2025), Claude Opus 4.6, Claude Code v2.0 (Sep 2025), Gemini 3 Flash (Jan 2026), Grok 4, Mistral Le Chat
- Agentic tool prompts: Cursor v1.0–v2.0 (GPT-5), Windsurf/Cascade Wave 11, Devin AI, Manus AI (event-driven), GitHub Copilot (VSCode Agent), Kiro (AWS), OpenAI Codex CLI, Google Gemini CLI, Cline
- Official docs: Anthropic prompt engineering guide, OpenAI GPT-4.1 prompting guide, Google Gemini 3 prompting guide, Mistral prompting docs
- Open-source agents: Aider, pi-coding-agent (@mariozechner), OpenCode
- Memory frameworks: Mem0, Letta/MemGPT, Claude Code Auto Dream
- Research: MemGPT paper (arXiv:2310.08560), Sleep-time Compute (arXiv:2504.13171), LoCoMo benchmark, LongMemEval