prompts
Version Your Prompts with LangWatch Prompts CLI
Determine Scope
If the user's request is general ("set up prompt versioning", "version my prompts"):
- Read the full codebase to find all hardcoded prompt strings
- Study git history to understand what changed and why — focus on agent behavior changes, prompt tweaks, bug fixes. Read commit messages for context.
- Set up the Prompts CLI and create managed prompts for each hardcoded prompt
- Update all application code to use
langwatch.prompts.get()
If the user's request is specific ("version this prompt", "create a new prompt version"):
- Focus on the specific prompt
- Create or update the managed prompt
- Update the relevant code to use
langwatch.prompts.get()
Plan Limits
LangWatch's free plan has limits on prompts, scenarios, evaluators, experiments, and datasets. When you hit a limit, the API returns "Free plan limit of N reached..." with an upgrade link.
How to handle:
- Work within the limits — if 3 scenarios are allowed, create 3 meaningful ones, not 10.
- Make every creation count: each one should demonstrate clear value.
- Show what works FIRST. If you hit a limit, summarize what was accomplished and direct the user to upgrade at https://app.langwatch.ai/settings/subscription.
- Do NOT delete existing resources to make room, and do NOT reuse a scenario set to cram in more tests.
If LANGWATCH_ENDPOINT is set in .env, the user is self-hosted — direct them to {LANGWATCH_ENDPOINT}/settings/license instead
Step 1: Read the Prompts CLI Docs
Use langwatch docs <path> to read documentation as Markdown. Some useful entry points:
langwatch docs # Docs index
langwatch docs integration/python/guide # Python integration
langwatch docs integration/typescript/guide # TypeScript integration
langwatch docs prompt-management/cli # Prompts CLI
langwatch scenario-docs # Scenario docs index
Discover commands with langwatch --help and langwatch <subcommand> --help. List and get commands accept --format json for machine-readable output. Read the docs first instead of guessing SDK APIs or CLI flags.
If no shell is available, fetch the same Markdown over plain HTTP — append .md to any docs path (e.g. https://langwatch.ai/docs/integration/python/guide.md). Index: https://langwatch.ai/docs/llms.txt. Scenario index: https://langwatch.ai/scenario/llms.txt
Then specifically read the Prompts CLI guide:
langwatch docs prompt-management/cli
CRITICAL: Do NOT guess how to use the Prompts CLI. Read the docs first.
Step 2: Initialize Prompts in the Project
langwatch prompt init
Creates a prompts.json config and a prompts/ directory in the project root.
Step 3: Create a Managed Prompt for Each Hardcoded Prompt
Scan the codebase for hardcoded prompt strings (system messages, instructions). For each:
langwatch prompt create <name>
Edit the generated .prompt.yaml file to match the original prompt content.
Step 4: Update Application Code
Replace every hardcoded prompt string with a call to langwatch.prompts.get().
Python (BAD → GOOD):
agent = Agent(instructions="You are a helpful assistant.")
import langwatch
prompt = langwatch.prompts.get("my-agent")
agent = Agent(instructions=prompt.compile().messages[0]["content"])
TypeScript (BAD → GOOD):
const systemPrompt = "You are a helpful assistant.";
const langwatch = new LangWatch();
const prompt = await langwatch.prompts.get("my-agent");
CRITICAL: Do NOT wrap langwatch.prompts.get() in a try/catch with a hardcoded fallback string. The whole point of prompt versioning is that prompts are managed externally. A fallback defeats this by silently reverting to a stale hardcoded copy.
Step 5: Sync to the Platform
langwatch prompt sync
Step 6: Tag Versions for Deployment
Three built-in tags: latest (auto-assigned), production, staging. Update code to fetch by tag:
prompt = langwatch.prompts.get("my-agent", tag="production")
const prompt = await langwatch.prompts.get("my-agent", { tag: "production" });
Assign tags via the CLI (or the Deploy dialog in the LangWatch UI):
langwatch prompt tag assign my-agent production
For canary or blue/green deployments, create custom tags with langwatch prompt tag create.
Step 7: Verify
Run langwatch prompt list to confirm everything synced, or open the Prompts section in the LangWatch app.
Common Mistakes
- Do NOT hardcode prompts — always fetch via
langwatch.prompts.get() - Do NOT add a hardcoded fallback string in a try/catch — that silently defeats versioning
- Do NOT manually edit
prompts.json— use the CLI - Do NOT skip
langwatch prompt syncafter creating prompts
More from langwatch/skills
evaluations
Set up comprehensive evaluations for your AI agent with LangWatch — experiments (batch testing), evaluators (scoring functions), datasets, online evaluation (production monitoring), and guardrails (real-time blocking). Supports both code (SDK) and platform (CLI) approaches. Use when the user wants to evaluate, test, benchmark, monitor, or safeguard their agent.
49scenarios
Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios via the `langwatch` CLI, and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.
48tracing
Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.
45level-up
Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.
37analytics
Analyze your AI agent's performance using LangWatch analytics. Use when the user wants to understand costs, latency, error rates, usage trends, or debug specific traces. Works with any LangWatch-instrumented agent.
31datasets
Generate realistic synthetic evaluation datasets by analyzing the user's codebase, prompts, production traces, and reference materials. Interactive, consultant-style — asks clarifying questions, proposes a plan, generates a preview for approval, then delivers a complete dataset uploaded to LangWatch. Use when user asks to generate, create, or build a dataset for evaluation, testing, or benchmarking.
12