Cledon — Voice AI Agent Testing

Cledon tests voice AI agents by simulating callers that phone your agent and evaluate responses against assertions.

Domain Model

Agent        — the voice AI being tested (name, phone number, personality)
Folder       — groups related test cases
Test Case    — defines assertions + expected tool calls for one agent
Scenario     — a runnable test with caller instructions for one test case
Run          — execution of a scenario producing transcript + pass/fail results

Relationships: Agent → many Test Cases → many Scenarios. Each Scenario produces Runs.

Available Tools (22)

Analytics

Tool	Purpose
`get-overall-stats`	Dashboard summary: total scenarios, runs, pass rate, avg duration
`get-run-history`	Recent runs with pass/fail counts (1-90 days lookback)
`get-failed-assertions`	Top 10 recurring failures with up to 3 example runs each

Agents

Tool	Purpose
`list-agents`	List all voice agents
`get-agent`	Full agent details by ID
`create-agent`	Create agent in call mode (phone number) or LLM mode (ElevenLabs, Vapi, LiveKit, Famulor, Synthflow)
`update-agent`	Update agent properties
`delete-agent`	Delete agent and associated data

Test Cases & Scenarios

Tool	Purpose
`list-testcases`	List test cases (optional folderId filter)
`get-testcase`	Full test case with assertions and expected tool calls
`create-testcase`	AI-generate test case from a transcript or system prompt; supports `includeScenarios` to auto-create scenarios
`update-testcase`	Update test case properties
`execute-testcase`	Run all scenarios for a test case
`list-scenarios`	List scenarios (optional testCaseId filter)
`get-scenario`	Full scenario with caller instructions

Execution

Tool	Purpose
`run-scenario`	Trigger single test → returns runId
`run-multiple-scenarios`	Batch trigger → returns array of runIds
`get-run-status`	Full run details: transcript, assertions, tool call validation
`get-scenario-runs`	Run history for one scenario with pass/fail counts
`cancel-run`	Cancel a stuck run (only status=running)

Credentials

Tool	Purpose
`list-credentials`	List all stored voice platform credentials (keys never exposed)
`create-credential`	Store a new platform API key (elevenlabs, vapi, livekit, famulor, synthflow)
`update-credential`	Update a credential's name or API key
`delete-credential`	Delete a stored credential

Workflows

Get an overview of testing status

get-overall-stats → see pass rate, total runs, average duration
get-run-history with days=7 → see recent individual results
get-failed-assertions → identify systemic issues

Run a test and check results

list-scenarios → find the scenario ID
run-scenario with scenarioId → get back a runId
Wait a moment, then get-run-status with runId → see transcript + assertion results
If status is still "running", wait and check again

Run all tests for a test case

list-scenarios with testCaseId filter → collect all scenario IDs
run-multiple-scenarios with the ID array
get-run-history with days=1 → see batch results

Investigate failures

get-failed-assertions → find the most common failures
Pick a failure, note the example runIds
get-run-status for each runId → read the transcript to understand what went wrong
get-scenario-runs for that scenarioId → check if it's a regression or consistent failure

Drill into a specific test case

get-testcase with id → see assertions and expected tool calls
list-scenarios with testCaseId → see all persona combinations
get-scenario for each → see caller instructions

Create a new test from scratch

list-agents → pick the agent to test (or create-agent)
create-testcase with agent ID and assertions; set includeScenarios: true to auto-generate scenarios
execute-testcase → run all scenarios, or run-scenario → run a single one

Create tests from a transcript

list-agents → pick the agent (or create-agent)
create-testcase with agentId and transcript — AI analyzes the transcript and generates assertions, icons, and expected tool calls
Optionally set includeScenarios: true to also generate caller scenarios
execute-testcase → run all generated scenarios

Key Patterns

List endpoints return compact data. Use the corresponding get-by-ID tool to see full details.
run-scenario is async: it returns a runId immediately. Poll get-run-status to see results.
All data is scoped to the authenticated user's organization. No cross-tenant access.
Run outcome is either "passed" or "failed". Run status progresses: running → completed/failed.

cledon