senior-prompt-engineer
Installation
SKILL.md
Senior Prompt Engineer
Eval-driven prompt engineering, RAG quality measurement, and agent workflow validation. Everything here is model-agnostic by design: techniques are framed by what they do, not by which model generation they were observed on, and the tools never hardcode model IDs or pricing — you supply your provider's current rates when you want dollar figures.
Operating Rules
- Never change a prompt without a baseline. Capture metrics first (
--analyze --output baseline.json), then compare every iteration against it. - Eval set before optimization. 10–20 representative cases with expected outputs minimum. If the user has no eval set, build one with them before touching the prompt — optimizing against vibes is the #1 failure mode.
- Prefer platform features over prompt hacks. If the provider offers native structured outputs / JSON schema enforcement, tool-use APIs, or prompt caching, use those instead of "respond ONLY with JSON" incantations. Prompt-level format enforcement is the fallback, not the default.
- Current-generation models need less scaffolding. Don't add chain-of-thought boilerplate, role framing, or few-shot examples reflexively — frontier models often do worse with redundant scaffolding. Add each element only when the eval set shows it helps.
- Cost numbers are always user-supplied. Look up the provider's current per-Mtok pricing and pass it via
--price-per-mtok(never trust a cached price table — including any you remember).
Tools (exact CLIs, all stdlib)
1. Prompt Optimizer — scripts/prompt_optimizer.py
Static analysis: token estimate, clarity/structure scores (0–100), ambiguity + redundancy detection, few-shot example extraction.