langsmith-trace-analyzer
LangSmith Trace Analyzer
Use this skill to move from raw LangSmith traces to actionable debugging/evaluation insights.
Quick Start
# Install dependencies
uv pip install langsmith langsmith-fetch
# Auth
export LANGSMITH_API_KEY=<your_langsmith_api_key>
Fast workflow
- Download traces with
scripts/download_traces.py(orscripts/download_traces.ts). - Analyze downloaded JSON with
scripts/analyze_traces.py. - Load targeted references only when needed:
references/filtering-querying.mdfor query/filter syntaxreferences/analysis-patterns.mdfor deeper diagnosticsreferences/benchmark-analysis.mdfor benchmark-specific workflows
Decision Guide
-
Known trace IDs
Uselangsmith-fetch trace <id>directly, or--trace-idsin downloader scripts. -
Need to discover traces first
Use LangSmith SDKlist_runs/listRunswith filters, then download selected trace IDs. -
Need aggregate insights
Runanalyze_traces.pyfor summary stats, patterns, and passed-vs-failed comparisons.
Core Workflows
1) Download and organize traces
Python:
uv run skills/langsmith-trace-analyzer/scripts/download_traces.py \
--project "my-project" \
--filter "job_id=abc123" \
--last-hours 24 \
--limit 100 \
--output ./traces \
--organize
TypeScript:
ts-node skills/langsmith-trace-analyzer/scripts/download_traces.ts \
--project "my-project" \
--filter "job_id=abc123" \
--last-hours 24 \
--limit 100 \
--output ./traces
Output layout:
traces/
├── manifest.json
└── by-outcome/
├── passed/
├── failed/
└── error/
├── GraphRecursionError/
├── TimeoutError/
└── DaytonaError/
Notes:
- Python script supports
--organize/--no-organize. - Both scripts use SDK filtering plus
langsmith-fetchfor full trace payload export.
2) Analyze downloaded traces
# Markdown report
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --output report.md
# JSON output
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --json
# Compare passed vs failed (expects by-outcome folders)
uv run skills/langsmith-trace-analyzer/scripts/analyze_traces.py ./traces --compare --output comparison.md
The analyzer reports:
- message/tool-call/token/duration summaries
- top tool usage
- anomaly patterns (high message count, repeated tools, quick failures)
- passed-vs-failed metric deltas when comparison is enabled
3) Query traces correctly (SDK)
Use official LangSmith run filter syntax via filter and/or start_time:
from datetime import datetime, timedelta, timezone
from langsmith import Client
client = Client()
start = datetime.now(timezone.utc) - timedelta(hours=24)
filter_query = 'and(eq(metadata_key, "job_id"), eq(metadata_value, "abc123"))'
runs = client.list_runs(
project_name="my-project",
is_root=True,
start_time=start,
filter=filter_query,
)
For TypeScript:
import { Client } from "langsmith";
const client = new Client();
for await (const run of client.listRuns({
projectName: "my-project",
isRoot: true,
filter: 'and(eq(metadata_key, "job_id"), eq(metadata_value, "abc123"))',
})) {
console.log(run.id, run.status);
}
Accuracy and Schema Notes
- LangSmith run fields are commonly top-level (
status,error,total_tokens,start_time,end_time). - Some exported traces also include nested metadata (
metadataorextra.metadata) and/ormessages. analyze_traces.pyis resilient to multiple payload shapes, including raw array payloads.- For full conversation content, prefer downloaded trace payloads over bare
list_runsresults.
Troubleshooting
| Issue | Likely Cause | Action |
|---|---|---|
LANGSMITH_API_KEY missing |
Auth not configured | export LANGSMITH_API_KEY=<your_langsmith_api_key> |
| No runs returned | Wrong project/filter/time range | Verify project name and filter syntax |
| Empty/partial message arrays | Run schema differs or incomplete data | Use downloaded trace JSON and inspect status/error fields |
| JSON parse error on downloaded files | Bad/incomplete export | Re-download trace; use --format raw paths in scripts |
| Re-downloading same traces repeatedly | Existing files in nested folders | Use current scripts (they check existing files across output tree) |
Safety for Open Source
- Do not commit downloaded trace artifacts (
manifest.json, trace JSON dumps) unless sanitized. - Trace payloads can contain user prompts, outputs, metadata, and other sensitive runtime data.
- Keep this skill repository focused on scripts/templates, not production trace exports.
Resources
scripts/
scripts/download_traces.py: Python downloader + organizerscripts/download_traces.ts: TypeScript downloader + organizerscripts/analyze_traces.py: Offline analysis and reporting
references/
references/filtering-querying.md: LangSmith query/filter examplesreferences/analysis-patterns.md: Diagnostic patterns and heuristicsreferences/benchmark-analysis.md: Benchmark-oriented analysis
More from lubu-labs/langchain-agent-skills
langgraph-agent-patterns
Implement multi-agent coordination patterns (supervisor-subagent, router, orchestrator-worker, handoffs) for LangGraph applications. Use when users want to (1) implement multi-agent systems, (2) coordinate multiple specialized agents, (3) choose between coordination patterns, (4) set up supervisor-subagent workflows, (5) implement router-based agent selection, (6) create parallel orchestrator-worker patterns, (7) implement agent handoffs, (8) design state schemas for multi-agent systems, or (9) debug multi-agent coordination issues.
43langgraph-state-management
Design state schemas, implement reducers, configure persistence, and debug state issues for LangGraph applications. Use when users want to (1) design or define state schemas for LangGraph graphs, (2) implement reducer functions for state accumulation, (3) configure persistence with checkpointers (InMemorySaver/MemorySaver, SqliteSaver, PostgresSaver), (4) debug state update issues or unexpected state behavior, (5) migrate state schemas between versions, (6) validate state schema structure, (7) choose between TypedDict and MessagesState patterns, (8) implement custom reducers for lists, dicts, or sets, (9) use the Overwrite type to bypass reducers, (10) set up thread-based persistence for multi-turn conversations, or (11) inspect checkpoints for debugging.
26langgraph-error-handling
Implement LangGraph error handling with current v1 patterns. Use when users need to classify failures, add RetryPolicy for transient issues, build LLM recovery loops with Command routing, add human-in-the-loop with interrupt()/resume, handle ToolNode errors, or choose a safe strategy between retry, recovery, and escalation.
25langgraph-testing-evaluation
Use this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.
24langgraph-project-setup
Initialize and configure LangGraph projects with proper structure, langgraph.json configuration, environment variables, and dependency management. Use when users want to (1) create a new LangGraph project, (2) set up langgraph.json for deployment, (3) configure environment variables for LLM providers, (4) initialize project structure for agents, (5) set up local development with LangGraph Studio, (6) configure dependencies (pyproject.toml, requirements.txt, package.json), or (7) troubleshoot project configuration issues.
21deepagents-setup-configuration
Initialize, validate, and troubleshoot Deep Agents projects in Python or JavaScript using the `deepagents` package. Use when users need to create agents with built-in planning/filesystem/subagents, configure middleware/backends/checkpointing/HITL, migrate from `create_react_agent` or `create_agent`, scaffold projects with repo scripts, validate agent config files, and confirm compatibility with current LangChain/LangGraph/LangSmith docs.
21