langsmith-trace-analyzer
LangSmith Trace Analyzer
Use this skill to move from raw LangSmith traces to actionable debugging/evaluation insights.
Quick Start
# Install dependencies
uv pip install langsmith langsmith-fetch
# Auth
export LANGSMITH_API_KEY=<your_langsmith_api_key>
Fast workflow
- Download traces with
scripts/download_traces.py(orscripts/download_traces.ts). - Analyze downloaded JSON with
scripts/analyze_traces.py. - Load targeted references only when needed:
references/filtering-querying.mdfor query/filter syntaxreferences/analysis-patterns.mdfor deeper diagnosticsreferences/benchmark-analysis.mdfor benchmark-specific workflows
Decision Guide
-
Known trace IDs
Uselangsmith-fetch trace <id>directly, or--trace-idsin downloader scripts. -
Need to discover traces first
Use LangSmith SDKlist_runs/listRunswith filters, then download selected trace IDs. -
Need aggregate insights
Runanalyze_traces.pyfor summary stats, patterns, and passed-vs-failed comparisons.
Core Workflows
1) Download and organize traces
Python:
uv run skills/langsmith-trace-analyzer/scripts/download_traces.py \
--project "my-project" \
--filter "job_id=abc123" \
--last-hours 24 \
--limit 100 \
--output ./traces \
--organize
TypeScript:
ts-node skills/langsmith-trace-analyzer/scripts/download_traces.ts \
--project "my-project" \
--filter "job_id=abc123" \
--last-hours 24 \
--limit 100 \
--output ./traces
Output layout:
traces/
├── manifest.json
└── by-outcome/
├── passed/
├── failed/
└── error/
├── GraphRecursionError/
├── TimeoutError/
└── DaytonaError/
Notes:
- Python script supports
--organize/--no-organize. - Both scripts use SDK filtering plus
langsmith-fetchfor full trace payload export.
2) Analyze downloaded traces
# Markdown report
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --output report.md
# JSON output
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --json
# Compare passed vs failed (expects by-outcome folders)
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --compare --output comparison.md
The analyzer reports:
- message/tool-call/token/duration summaries
- top tool usage
- anomaly patterns (high message count, repeated tools, quick failures)
- passed-vs-failed metric deltas when comparison is enabled
3) Query traces correctly (SDK)
Use official LangSmith run filter syntax via filter and/or start_time:
from datetime import datetime, timedelta, timezone
from langsmith import Client
client = Client()
start = datetime.now(timezone.utc) - timedelta(hours=24)
filter_query = 'and(eq(metadata_key, "job_id"), eq(metadata_value, "abc123"))'
runs = client.list_runs(
project_name="my-project",
is_root=True,
start_time=start,
filter=filter_query,
)
For TypeScript:
import { Client } from "langsmith";
const client = new Client();
for await (const run of client.listRuns({
projectName: "my-project",
isRoot: true,
filter: 'and(eq(metadata_key, "job_id"), eq(metadata_value, "abc123"))',
})) {
console.log(run.id, run.status);
}
Accuracy and Schema Notes
- LangSmith run fields are commonly top-level (
status,error,total_tokens,start_time,end_time). - Some exported traces also include nested metadata (
metadataorextra.metadata) and/ormessages. analyze_traces.pyis resilient to multiple payload shapes, including raw array payloads.- For full conversation content, prefer downloaded trace payloads over bare
list_runsresults.
Troubleshooting
| Issue | Likely Cause | Action |
|---|---|---|
LANGSMITH_API_KEY missing |
Auth not configured | export LANGSMITH_API_KEY=<your_langsmith_api_key> |
| No runs returned | Wrong project/filter/time range | Verify project name and filter syntax |
| Empty/partial message arrays | Run schema differs or incomplete data | Use downloaded trace JSON and inspect status/error fields |
| JSON parse error on downloaded files | Bad/incomplete export | Re-download trace; use --format raw paths in scripts |
| Re-downloading same traces repeatedly | Existing files in nested folders | Use current scripts (they check existing files across output tree) |
Safety for Open Source
- Do not commit downloaded trace artifacts (
manifest.json, trace JSON dumps) unless sanitized. - Trace payloads can contain user prompts, outputs, metadata, and other sensitive runtime data.
- Keep this skill repository focused on scripts/templates, not production trace exports.
Resources
scripts/
scripts/download_traces.py: Python downloader + organizerscripts/download_traces.ts: TypeScript downloader + organizerscripts/analyze_traces.py: Offline analysis and reporting
references/
references/filtering-querying.md: LangSmith query/filter examplesreferences/analysis-patterns.md: Diagnostic patterns and heuristicsreferences/benchmark-analysis.md: Benchmark-oriented analysis
More from lubu-labs/langchain-agent-skills
langsmith-deployment
Deploy and operate production agent servers with LangSmith Deployment. Use when work involves choosing Cloud vs Hybrid/Self-hosted-with-control-plane vs Standalone, preparing/validating langgraph.json, creating deployments or revisions, rolling back revisions, wiring CI/CD to control-plane APIs, configuring environment variables and secrets, setting monitoring/alerts/webhooks, or troubleshooting deployment/runtime/scaling issues for LangChain/LangGraph applications.
17skill-creator
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
15deepagents-planning-todos
Use the write_todos tool effectively for task planning and decomposition in Deep Agents. Use when users want to (1) implement task planning with write_todos, (2) break down complex tasks into subtasks, (3) track agent progress through todos, (4) debug why todos aren't completing, (5) design todo structures for different task types (research, coding, analysis), (6) understand todo status lifecycle and best practices, or (7) visualize todo progression from LangSmith traces.
15