LangSmith Datasets

SKILL.md
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # Required
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys

Dependencies (Python)

pip install langsmith click rich python-dotenv

Dependencies (TypeScript/JavaScript)

npm install langsmith commander chalk cli-table3 dotenv

<input_format> This script requires traces exported in JSONL format (one run per line).

Required Fields

Each line must be a JSON object with these fields:

{"run_id": "...", "trace_id": "...", "name": "...", "run_type": "...", "parent_run_id": "...", "inputs": {...}, "outputs": {...}}
Field Description
run_id Unique identifier for this run
trace_id Groups runs into traces (used for hierarchy reconstruction)
name Run name (e.g., "model", "classify_email")
run_type One of: chain, llm, tool, retriever
parent_run_id Parent run ID (null for root)
inputs Run inputs (required for dataset generation)
outputs Run outputs (required for dataset generation)

Important: You MUST have inputs and outputs to generate datasets correctly.

Before generating datasets, verify your traces exist:

  • Check that JSONL files exist in the output directory
  • Confirm traces have both inputs and outputs populated
  • Inspect the trace hierarchy to understand the structure </input_format>

Scripts

Python:

  • generate_datasets.py - Create evaluation datasets from exported trace files
  • query_datasets.py - View and inspect datasets

TypeScript/JavaScript:

  • generate_datasets.ts - Create evaluation datasets from exported trace files
  • query_datasets.ts - View and inspect datasets

Common Flags

All dataset generation commands support:

  • --input <path> - Input traces: directory of .jsonl files or single .jsonl file (required)
  • --type <type> - Dataset type: final_response, single_step, trajectory, rag (required)
  • --output <path> - Output file (.json or .csv) (required)
  • --input-fields - Comma-separated input keys to extract (e.g., "query,question")
  • --output-fields - Comma-separated output keys to extract (e.g., "answer,response")
  • --messages-only - Only extract from messages arrays, skip other fields
  • --upload <name> - Upload to LangSmith with this dataset name
  • --replace - Overwrite existing file/dataset (will prompt for confirmation)
  • --yes - Skip confirmation prompts (use with caution)

IMPORTANT - Safety Prompts:

  • The script prompts for confirmation before deleting existing datasets with --replace
  • If you are running with user input: ALWAYS wait for user input; NEVER use --yes unless the user explicitly requests it
  • If you are running non-interactively: Use --replace --yes together to ensure proper replacement

<dataset_types_overview> Use --type <type> flag with the generate_datasets script:

  • final_response - Full conversation with expected output. Tests complete agent behavior.
  • single_step - Single node inputs/outputs. Tests specific node behavior. Use --run-name to target a node.
  • trajectory - Tool call sequence. Tests execution path. Use --depth to control depth.
  • rag - Question/chunks/answer/citations. Tests retrieval quality. Only matches run_type="retriever". </dataset_types_overview>

<script_usage>

Script Usage

Extract specific fields

python generate_datasets.py --input ./traces --type final_response
--input-fields "email_content"
--output-fields "response"
--output /tmp/final.json

Generate trajectory dataset

python generate_datasets.py --input ./traces --type trajectory --output /tmp/trajectory.json

Generate and upload

python generate_datasets.py --input ./traces --type trajectory
--output /tmp/trajectory.json
--upload "Skills: Trajectory"

Query datasets

python query_datasets.py list-datasets python query_datasets.py show "Skills: Trajectory" --limit 5 python query_datasets.py view-file /tmp/trajectory_ds.json --limit 3

</python>

<typescript>
Generate and query datasets using the TypeScript CLI scripts.
```bash
# Basic usage (raw inputs, extracted output)
npx tsx generate_datasets.ts --input ./traces --type final_response --output /tmp/final_response.json

# Extract specific fields
npx tsx generate_datasets.ts --input ./traces --type final_response \
  --input-fields "email_content" \
  --output-fields "response" \
  --output /tmp/final.json

# Generate trajectory dataset
npx tsx generate_datasets.ts --input ./traces --type trajectory --output /tmp/trajectory.json

# Generate and upload
npx tsx generate_datasets.ts --input ./traces --type trajectory \
  --output /tmp/trajectory.json \
  --upload "Skills: Trajectory"

# Query datasets
npx tsx query_datasets.ts list-datasets
npx tsx query_datasets.ts show "Skills: Trajectory" --limit 5
npx tsx query_datasets.ts view-file /tmp/trajectory_ds.json --limit 3

<example_workflow> Complete workflow from exported traces to LangSmith datasets:

python generate_datasets.py --input ./traces --type single_step
--run-name model
--sample-per-trace 2
--output /tmp/model.json
--upload "Skills: Single Step (model)" --replace

python generate_datasets.py --input ./traces --type trajectory
--output /tmp/traj.json
--upload "Skills: Trajectory (all depths)" --replace

Query locally if needed

python query_datasets.py show "Skills: Final Response" --limit 3

</python>

<typescript>
Generate all dataset types from exported traces and upload to LangSmith.
```bash
# Generate all dataset types from exported traces
npx tsx generate_datasets.ts --input ./traces --type final_response \
  --output /tmp/final.json \
  --upload "Skills: Final Response" --replace

npx tsx generate_datasets.ts --input ./traces --type single_step \
  --run-name model \
  --sample-per-trace 2 \
  --output /tmp/model.json \
  --upload "Skills: Single Step (model)" --replace

npx tsx generate_datasets.ts --input ./traces --type trajectory \
  --output /tmp/traj.json \
  --upload "Skills: Trajectory (all depths)" --replace

# Query locally if needed
npx tsx query_datasets.ts show "Skills: Final Response" --limit 3

Empty final_response outputs:

  • Check that root run has outputs
  • Use --output-fields to target specific field
  • Use --messages-only if output is in messages format

No trajectory examples:

  • Tools might be at different depth - try removing --depth or use --depth 2
  • Verify tool calls exist in your exported JSONL files

Too many single_step examples:

  • Use --sample-per-trace 2 to limit examples per trace
  • Reduces dataset size while maintaining diversity

No RAG data:

  • RAG only matches run_type="retriever"
  • For custom retriever names, use single_step --run-name <retriever> instead

Dataset upload fails:

  • Check dataset doesn't exist or use --replace
  • Verify LANGSMITH_API_KEY is set
Weekly Installs
0
GitHub Stars
336
First Seen
Jan 1, 1970