custom-tracing
Custom Tracing (Direct API / OTLP)
Guide users through sending traces to ZeroEval without the Python or TypeScript SDK, using the REST API or OpenTelemetry protocol.
When To Use
- The user's language or runtime has no ZeroEval SDK (Go, Ruby, Java, Rust, Elixir, PHP, etc.).
- The user wants to send spans over plain HTTP from any environment.
- The user already has OpenTelemetry instrumentation and wants to export to ZeroEval.
- The user prefers a vendor-neutral or SDK-free integration path.
- The user explicitly asks about
POST /spans, the REST API, or OTLP ingestion.
Do not use this skill when the user is working in Python or TypeScript and wants the full SDK experience (auto-instrumentation, ze.prompt, etc.). Use zeroeval-install instead.
Prerequisites
- A ZeroEval account and API key from Settings -> API Keys.
- An HTTP client or OpenTelemetry exporter in the user's language of choice.
Execution Sequence
Follow these steps in order. Load the reference playbook only when needed for detailed payloads and examples.
Step 1: Choose Integration Path
Ask the user which path fits their setup:
- REST API -- send spans directly via
POST /spans. Best for custom integrations, scripts, or languages without OpenTelemetry support. - OpenTelemetry (OTLP) -- export traces via the standard OTLP protocol to
POST /v1/traces. Best when the app already uses OpenTelemetry or needs multi-backend fan-out.
If the user is unsure, default to REST API -- it has fewer dependencies and works from any language with an HTTP client.
Step 2: Configure Authentication
The API key is passed as a Bearer token in every request:
Authorization: Bearer YOUR_ZEROEVAL_API_KEY
Recommend storing the key in an environment variable (ZEROEVAL_API_KEY) rather than hardcoding it.
Step 3: Send First Trace
Load references/api-integration-playbook.md and follow the section matching the chosen path:
- REST API: Follow the "REST API Quick Start" section.
- OTLP: Follow the "OTLP Quick Start" section.
Minimum outcome: at least one span is ingested and visible in the ZeroEval dashboard.
Step 4: Add Structure (Sessions and Nested Spans)
Once the first trace is confirmed, guide the user through optional structure:
- Sessions: group related traces by passing
session_idor thesessionobject on spans. - Nested spans: use
parent_span_idto build a tree of operations within a trace. - LLM cost tracking: set
kind: "llm"and includeprovider,model,inputTokens,outputTokensin attributes for automatic cost calculation.
Details are in the "Adding Structure" section of the playbook.
Step 5: Validate and Troubleshoot
Run the checklist:
- API key is valid (no 401/403 responses)
- At least one span is visible in the dashboard
-
trace_idgroups spans into a single trace view - Session grouping works (if configured)
- LLM cost appears on spans with
kind: "llm"and token attributes
If any check fails, follow the "Troubleshooting" section of the playbook.
Step 6: Suggest Next Steps
After tracing is working:
- Judges: recommend the
create-judgeskill to set up automated evaluation on ingested traces. - Feedback: point users to the Feedback API (
POST /feedback) for human-in-the-loop review. - SDK upgrade: if the user later adopts Python or TypeScript, suggest
zeroeval-installfor auto-instrumentation andze.promptsupport.
Key Principles
- Spans are the entry point:
POST /spansauto-creates traces and sessions. Start there, not withPOST /tracesorPOST /sessions. - One span, one trace: the simplest integration is a single span per LLM call. Add nesting only when the user needs it.
- Cloud by default: the production base URL is
https://api.zeroeval.com. Only usehttp://localhost:8000for local development. - Evidence over assumption: confirm spans appear in the dashboard before adding complexity.
More from zeroeval/zeroeval-skills
manage-data
Create, load, push, version, and manage benchmark datasets with the ZeroEval Python SDK or git. Use when adding data to a benchmark, creating a dataset from code or CSV, pushing data to the backend, managing subsets, pulling existing benchmarks, converting data to Parquet, or setting up a git-based data workflow. Triggers on "add data", "create dataset", "push dataset", "upload data", "manage benchmark data", "dataset versioning", "subsets", "pull dataset", "parquet", "multimodal dataset".
16run-evals
Write tasks, evaluations, and scoring pipelines with the ZeroEval Python SDK. Covers defining @ze.task functions, running evals with dataset.eval(), writing row/column/run evaluators, scoring with column_map, emitting signals, configuring execution (workers, retries, checkpoints), repeating and resuming runs, and inspecting results. Triggers on "run evals", "write evaluation", "benchmark model", "score results", "evaluation pipeline", "task decorator", "scoring function", "column_map", "emit signal", "resume eval", "repeat eval".
15prompt-migration
This skill should be used when users want to migrate hardcoded prompts to ze.prompt for version tracking, feedback collection, judge linkage, and prompt optimization. It covers the full migration workflow for both Python and TypeScript. Triggers on "migrate prompt", "ze.prompt", "hardcoded prompt", "prompt migration", "send feedback", "prompt optimization", "wire feedback", or "connect judges to prompts".
11zeroeval-install
This skill should be used when users want to install, set up, or integrate ZeroEval into their AI application, agent, or pipeline. It covers SDK setup (Python and TypeScript), first-run tracing, ze.prompt migration, and judge recommendations. For non-SDK languages or direct API/OTLP ingestion it routes to the custom-tracing skill. Triggers on "install zeroeval", "set up zeroeval", "add tracing", "integrate zeroeval", "ze.prompt", "add judges", or "monitor my AI app".
10create-judge
This skill should be used when users want to create, design, or configure an automated judge in ZeroEval. It guides through understanding the evaluation goal, choosing binary vs scored evaluation, writing the judge template, designing structured criteria, and creating the judge via dashboard or API. Triggers on "create a judge", "add a judge", "evaluate my LLM output", "set up automated evaluation", "judge template", or "scoring criteria".
10