Senior Prompt Engineer

Eval-driven prompt engineering, RAG quality measurement, and agent workflow validation. Everything here is model-agnostic by design: techniques are framed by what they do, not by which model generation they were observed on, and the tools never hardcode model IDs or pricing — you supply your provider's current rates when you want dollar figures.

Operating Rules

Never change a prompt without a baseline. Capture metrics first (--analyze --output baseline.json), then compare every iteration against it.
Eval set before optimization. 10–20 representative cases with expected outputs minimum. If the user has no eval set, build one with them before touching the prompt — optimizing against vibes is the #1 failure mode.
Prefer platform features over prompt hacks. If the provider offers native structured outputs / JSON schema enforcement, tool-use APIs, or prompt caching, use those instead of "respond ONLY with JSON" incantations. Prompt-level format enforcement is the fallback, not the default.
Current-generation models need less scaffolding. Don't add chain-of-thought boilerplate, role framing, or few-shot examples reflexively — frontier models often do worse with redundant scaffolding. Add each element only when the eval set shows it helps.
Cost numbers are always user-supplied. Look up the provider's current per-Mtok pricing and pass it via --price-per-mtok (never trust a cached price table — including any you remember).

Tools (exact CLIs, all stdlib)

1. Prompt Optimizer — `scripts/prompt_optimizer.py`

Static analysis: token estimate, clarity/structure scores (0–100), ambiguity + redundancy detection, few-shot example extraction.

senior-prompt-engineer

Senior Prompt Engineer

Operating Rules

Tools (exact CLIs, all stdlib)

1. Prompt Optimizer — scripts/prompt_optimizer.py

1. Prompt Optimizer — `scripts/prompt_optimizer.py`