langsmith
SKILL.md
langsmith — LLM Observability, Evaluation & Prompt Management
Keyword:
langsmith·llm tracing·llm evaluation·@traceable·langsmith evaluateLangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.
When to use this skill
- Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
- Run offline evaluations with
evaluate()against a curated dataset - Set up production monitoring and online evaluation
- Manage and version prompts in the Prompt Hub
- Create datasets for regression testing and benchmarking
- Attach human or automated feedback to traces
- Use LLM-as-judge scoring with
openevals - Debug agent failures with end-to-end trace inspection
Instructions
- Install SDK:
pip install -U langsmith(Python) ornpm install langsmith(TypeScript) - Set environment variables:
LANGSMITH_TRACING=true,LANGSMITH_API_KEY=lsv2_... - Instrument with
@traceabledecorator orwrap_openai()wrapper - View traces at smith.langchain.com
- For evaluation setup, see references/python-sdk.md
- For CLI commands, see references/cli.md
- Run
bash scripts/setup.shto auto-configure environment
API Key: Get from smith.langchain.com → Settings → API Keys Docs: https://docs.langchain.com/langsmith
Quick Start
Python
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""Automatically traced in LangSmith"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")
TypeScript
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");
Core Concepts
| Concept | Description |
|---|---|
| Run | Individual operation (LLM call, tool call, retrieval). The fundamental unit. |
| Trace | All runs from a single user request, linked by trace_id. |
| Thread | Multiple traces in a conversation, linked by session_id or thread_id. |
| Project | Container grouping related traces (set via LANGSMITH_PROJECT). |
| Dataset | Collection of {inputs, outputs} examples for offline evaluation. |
| Experiment | Result set from running evaluate() against a dataset. |
| Feedback | Score/label attached to a run — numeric, categorical, or freeform. |
Tracing
@traceable decorator (Python)
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)
Selective tracing context
import langsmith as ls
# Enable tracing for this block only
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
# Disable tracing despite LANGSMITH_TRACING=true
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
Wrap provider clients
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())
Distributed tracing (microservices)
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # Pass to child service
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)
Evaluation
Basic evaluation with evaluate()
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())
# 1. Create dataset
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
# 2. Target function
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
# 3. Evaluator
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
# 4. Run experiment
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
LLM-as-judge with openevals
pip install -U openevals
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])
Evaluation types
| Type | When to use |
|---|---|
| Code/Heuristic | Exact match, format checks, rule-based |
| LLM-as-judge | Subjective quality, safety, reference-free |
| Human | Annotation queues, pairwise comparison |
| Pairwise | Compare two app versions |
| Online | Production traces, real traffic |
Prompt Hub
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()
# Push a prompt
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
# Pull and use
prompt = client.pull_prompt("my-assistant-prompt")
# Pull specific version:
prompt = client.pull_prompt("my-assistant-prompt:abc123")
Feedback
from langsmith import Client
import uuid
client = Client()
# Custom run ID for later feedback linking
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
# Attach feedback
client.create_feedback(
key="correctness",
score=1, # 0-1 numeric or categorical
run_id=my_run_id,
comment="Accurate and concise"
)
References
- Python SDK Reference — full Client API, @traceable signature, evaluate()
- TypeScript SDK Reference — Client, traceable, wrappers, evaluate
- CLI Reference — langsmith CLI commands
- Official Docs — langchain.com/langsmith
- SDK GitHub — MIT License, v0.7.17
- openevals — Prebuilt LLM evaluators
Weekly Installs
95
Repository
supercent-io/sk…templateGitHub Stars
48
First Seen
3 days ago
Security Audits
Installed on
gemini-cli84
codex81
opencode76
github-copilot76
kimi-cli76
amp76