Writing aimock Test Fixtures

What aimock Is

aimock is a zero-dependency mock infrastructure for AI apps. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Cohere). Multimedia endpoints (image generation, text-to-speech, audio transcription, video generation). MCP, A2A, AG-UI, and vector DB mocking. Runs a real HTTP server on a real port — works across processes, unlike MSW-style interceptors. WebSocket support for OpenAI Responses/Realtime and Gemini Live APIs. Record-and-replay for all endpoints including multimedia. Chaos testing and Prometheus metrics.

Core Mental Model

Fixtures = match criteria + response
First-match-wins — order matters
All providers share one fixture pool (provider adapters normalize to ChatCompletionRequest)
Fixtures are live — mutations after start() take effect immediately
Sequential responses are supported via sequenceIndex (match count tracked per fixture)

Match Field Reference

Field	Type	Matches Against
`userMessage`	`string`	Substring of last `role: "user"` message text
`userMessage`	`RegExp`	Pattern test on last `role: "user"` message text
`inputText`	`string`	Substring of embedding input text (concatenated if multiple inputs)
`inputText`	`RegExp`	Pattern test on embedding input text
`toolName`	`string`	Exact match on any tool in request's `tools[]` array (by `function.name`)
`toolCallId`	`string`	Exact match on `tool_call_id` of last `role: "tool"` message
`model`	`string`	Exact match on `req.model`
`model`	`RegExp`	Pattern test on `req.model`
`responseFormat`	`string`	Exact match on `req.response_format.type` (`"json_object"`, `"json_schema"`)
`sequenceIndex`	`number`	Matches only when this fixture's match count equals the given index (0-based)
`endpoint`	`string`	Restrict to endpoint type: `"chat"`, `"image"`, `"speech"`, `"transcription"`, `"video"`, `"embedding"`
`predicate`	`(req: ChatCompletionRequest) => boolean`	Custom function — full access to request

AND logic: all specified fields must match. Empty match {} = catch-all.

Multi-part content (e.g., [{type: "text", text: "hello"}]) is automatically extracted — userMessage matching works regardless of content format.

Response Types

Text

{
  content: "Hello!";
}

Tool Calls

{
  toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }];
}

arguments MUST be a JSON string, not an object. This is the #1 mistake.

Embedding

{
  embedding: [0.1, 0.2, 0.3, -0.5, 0.8];
}

The embedding vector is returned for each input in the request. If no embedding fixture matches, deterministic embeddings are auto-generated from the input text hash — you only need fixtures when you want specific vectors.

Image

// Single image
{
  image: {
    url: "https://example.com/generated.png"
  }
}
// Multiple images
{
  images: [{ url: "https://example.com/1.png" }, { b64Json: "iVBOR..." }]
}

Use match: { endpoint: "image" } to prevent cross-matching with chat fixtures.

Speech (TTS)

{ audio: "base64-encoded-audio-data" }
// With explicit format (default: mp3)
{ audio: "base64-data", format: "opus" }

Transcription

// Simple
{ transcription: { text: "Hello world" } }
// Verbose with timestamps
{ transcription: { text: "Hello world", language: "en", duration: 2.5, words: [...], segments: [...] } }

Video

{ video: { url: "https://example.com/video.mp4", duration: 10 } }

Video uses async polling — POST /v1/videos creates, GET /v1/videos/{id} checks status.

Error

{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 }

Chaos (Failure Injection)

The optional chaos field on a fixture enables probabilistic failure injection:

{
  chaos?: {
    dropRate?: number;      // Probability (0-1) of returning a 500 error
    malformedRate?: number; // Probability (0-1) of returning malformed JSON
    disconnectRate?: number; // Probability (0-1) of disconnecting mid-stream
  }
}

Rates are evaluated per-request. When triggered, the chaos failure replaces the normal response.

Common Patterns

Basic text fixture

mock.onMessage("hello", { content: "Hi there!" });

Tool call → tool result → final response (3-step agent loop)

The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result.

// Step 1: User asks about weather → LLM calls tool
mock.onMessage("weather", {
  toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }],
});

// Step 2: Tool result comes back → LLM responds with text
mock.addFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "It's 72°F in San Francisco." },
});

Why predicate, not userMessage? After a tool call, the client replays the same conversation with the tool result appended. The user message hasn't changed — userMessage: "weather" would match the SAME fixture again, creating an infinite loop.

Embedding fixture

// Match specific input text
mock.onEmbedding("search query", {
  embedding: [0.1, 0.2, 0.3, 0.4, 0.5],
});

// Match with regex
mock.onEmbedding(/product.*description/, {
  embedding: [0.9, -0.1, 0.5, 0.3, 0.2],
});

Structured output / JSON mode

// onJsonOutput auto-sets responseFormat: "json_object" and stringifies objects
mock.onJsonOutput("extract entities", {
  entities: [
    { name: "Acme Corp", type: "company" },
    { name: "Jane Doe", type: "person" },
  ],
});

// Equivalent manual form:
mock.addFixture({
  match: { userMessage: "extract entities", responseFormat: "json_object" },
  response: { content: '{"entities":[...]}' },
});

Sequential responses (same match, different responses)

// First call returns tool call, second returns text
mock.on(
  { userMessage: "status", sequenceIndex: 0 },
  { toolCalls: [{ name: "check_status", arguments: "{}" }] },
);
mock.on({ userMessage: "status", sequenceIndex: 1 }, { content: "All systems operational." });

Match counts are tracked per fixture group and reset with reset() or resetMatchCounts().

Streaming physics (realistic timing)

mock.onMessage(
  "tell me a story",
  { content: "Once upon a time..." },
  {
    streamingProfile: {
      ttft: 200, // 200ms before first token
      tps: 30, // 30 tokens per second after that
      jitter: 0.1, // ±10% random variance
    },
  },
);

Predicate-based routing (same user message, different context)

Common in supervisor/orchestrator patterns where the system prompt changes:

mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
      return typeof sys === "string" && sys.includes("Flights found: false");
    },
  },
  response: { toolCalls: [{ name: "search_flights", arguments: "{}" }] },
});

Catch-all (always add one)

Prevents unmatched requests from returning 404 and crashing the test:

mock.addFixture({
  match: { predicate: () => true },
  response: { content: "I understand. How can I help?" },
});

Tool result catch-all with prependFixture

Must go at the front so it matches before substring-based fixtures:

mock.prependFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "Done!" },
});

Stream interruption simulation (v1.3.0+)

mock.onMessage(
  "long response",
  { content: "This will be cut short..." },
  {
    truncateAfterChunks: 3, // Stop after 3 SSE chunks
    disconnectAfterMs: 500, // Or disconnect after 500ms
  },
);

Chaos testing (probabilistic failures)

mock.addFixture({
  match: { userMessage: "flaky" },
  response: { content: "Sometimes works!" },
  chaos: { dropRate: 0.3 },
});

30% of requests matching this fixture will get a 500 error instead of the response. Can also use malformedRate (garbled JSON) or disconnectRate (connection dropped mid-stream).

Server-level chaos applies to ALL requests:

mock.setChaos({ dropRate: 0.1 }); // 10% of all requests fail
mock.clearChaos(); // Remove server-level chaos

Error injection (one-shot)

mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" });
// Next request gets 429, then fixture auto-removes itself

JSON fixture files

{
  "fixtures": [
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi!" }
    },
    {
      "match": { "inputText": "search query" },
      "response": { "embedding": [0.1, 0.2, 0.3] }
    },
    {
      "match": { "userMessage": "status", "sequenceIndex": 0 },
      "response": { "content": "First response" }
    }
  ]
}

JSON files cannot use RegExp or predicate — those are code-only features. streamingProfile is supported in JSON fixture files.

Load with mock.loadFixtureFile("./fixtures/greetings.json") or mock.loadFixtureDir("./fixtures/").

API Endpoints

All providers share the same fixture pool — write fixtures once, they work for any endpoint.

Endpoint	Provider	Protocol
`POST /v1/chat/completions`	OpenAI	HTTP
`POST /v1/responses`	OpenAI	HTTP + WS
`POST /v1/messages`	Anthropic	HTTP
`POST /v1/embeddings`	OpenAI	HTTP
`POST /v1beta/models/{model}:{method}`	Google Gemini	HTTP
`POST /model/{modelId}/invoke`	AWS Bedrock	HTTP
`POST /openai/deployments/{id}/chat/completions`	Azure OpenAI	HTTP
`POST /openai/deployments/{id}/embeddings`	Azure OpenAI	HTTP
`GET /health`	—	HTTP
`GET /ready`	—	HTTP
`POST /model/{modelId}/invoke-with-response-stream`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse-stream`	AWS Bedrock	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:generateContent`	Vertex AI	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:streamGenerateContent`	Vertex AI	HTTP
`POST /api/chat`	Ollama	HTTP
`POST /api/generate`	Ollama	HTTP
`GET /api/tags`	Ollama	HTTP
`POST /v2/chat`	Cohere	HTTP
`GET /metrics`	—	HTTP
`GET /v1/models`	OpenAI-compat	HTTP
`WS /v1/responses`	OpenAI	WebSocket
`WS /v1/realtime`	OpenAI	WebSocket
`WS /ws/google.ai...BidiGenerateContent`	Gemini Live	WebSocket
`POST /v1/images/generations`	OpenAI	HTTP
`POST /v1beta/models/{model}:predict`	Gemini Imagen	HTTP
`POST /v1/audio/speech`	OpenAI	HTTP
`POST /v1/audio/transcriptions`	OpenAI	HTTP
`POST /v1/videos`	OpenAI	HTTP
`GET /v1/videos/{id}`	OpenAI	HTTP

Critical Gotchas

Order matters — first match wins. Specific fixtures before general ones. Use prependFixture() to force priority.
arguments must be a JSON string — "arguments": "{\"key\":\"value\"}" not "arguments": {"key":"value"}. The type system enforces this but JSON fixtures can get it wrong silently.
Latency is per-chunk, not total — latency: 100 means 100ms between each SSE chunk, not 100ms total response time. Similarly, truncateAfterChunks and disconnectAfterMs are for simulating stream interruptions (added in v1.3.0).
streamingProfile takes precedence over latency — when both are set on a fixture, streamingProfile controls timing. Use one or the other.
Tool result messages don't change the user message — after a tool call, the client sends the same conversation + tool result. Matching on userMessage will hit the SAME fixture again → infinite loop. Always use predicate checking role === "tool" for tool results.
clearFixtures() preserves the array reference — uses .length = 0, not reassignment. The running server reads the same array object.
Journal records everything — including 404 "no match" responses. Use mock.getLastRequest() to debug mismatches.
All providers share fixtures — a fixture matching "hello" works whether the request comes via /v1/chat/completions (OpenAI), /v1/messages (Anthropic), Gemini, Bedrock, or Azure endpoints.
WebSocket uses the same fixture pool — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live).
Embeddings auto-generate if no fixture matches — deterministic vectors are generated from the input text hash. You don't need a catch-all for embedding requests.
Sequential response counts are tracked per fixture — counts reset with reset() or resetMatchCounts(). The count increments after each match of that fixture group (all fixtures sharing the same non-sequenceIndex match fields).
Bedrock uses Anthropic Messages format internally — the adapter normalizes Bedrock requests to ChatCompletionRequest, so the same fixtures work. Bedrock supports both non-streaming (/invoke, /converse) and streaming (/invoke-with-response-stream, /converse-stream) endpoints.
Azure OpenAI routes through the same handlers — /openai/deployments/{id}/chat/completions maps to the completions handler, /openai/deployments/{id}/embeddings maps to the embeddings handler. Fixtures work unchanged.
Ollama defaults to streaming — opposite of OpenAI. Set stream: false explicitly in the request for non-streaming responses.
Ollama tool call arguments is an object, not a JSON string — unlike OpenAI where arguments is a JSON string, Ollama sends and expects a plain object.
Bedrock streaming uses binary Event Stream format — not SSE. The invoke-with-response-stream and converse-stream endpoints use AWS Event Stream binary encoding.
Vertex AI routes to the same handler as consumer Gemini — the same fixtures work for both Vertex AI (/v1/projects/.../models/{m}:generateContent) and consumer Gemini (/v1beta/models/{model}:generateContent).
Cohere requires model field — returns 400 if model is missing from the request body.

Mount & Composition

mount() API

Mount additional mock services onto a running LLMock server. All services share one port, one health endpoint, and one request journal.

const llm = new LLMock({ port: 5555 });
llm.mount("/mcp", mcpMock); // MCP tools at /mcp
llm.mount("/a2a", a2aMock); // A2A agents at /a2a
llm.mount("/vector", vectorMock); // Vector DB at /vector
await llm.start();

Any object implementing the Mountable interface (a handleRequest method that returns boolean) can be mounted. Path prefixes are stripped before the service sees the request — /mcp/tools/list arrives as /tools/list.

createMockSuite()

Unified lifecycle for LLMock + mounted services:

import { createMockSuite } from "@copilotkit/aimock";

const suite = createMockSuite({
  port: 0,
  fixtures: "./fixtures",
  services: { "/mcp": mcpMock, "/a2a": a2aMock },
});

await suite.start();
// suite.llm — the LLMock instance
// suite.url — base URL

afterEach(() => suite.reset()); // resets everything
afterAll(() => suite.stop());

aimock CLI config file

The aimock CLI reads a JSON config and serves all services on one port:

aimock --config aimock.json --port 4010

Config format:

{
  "llm": {
    "fixtures": "./fixtures",
    "latency": 0,
    "metrics": true
  },
  "services": {
    "/mcp": { "type": "mcp", "tools": "./mcp-tools.json" },
    "/a2a": { "type": "a2a", "agents": "./a2a-agents.json" }
  }
}

VectorMock

Mock vector database server for testing RAG pipelines. Supports Pinecone, Qdrant, and ChromaDB API formats.

import { VectorMock } from "@copilotkit/aimock";

const vector = new VectorMock();

// Create a collection and register query results
vector.addCollection("docs", { dimension: 1536 });
vector.onQuery("docs", [
  { id: "doc-1", score: 0.95, metadata: { title: "Getting Started" } },
  { id: "doc-2", score: 0.87, metadata: { title: "API Reference" } },
]);

// Upsert vectors
vector.upsert("docs", [
  { id: "v1", values: [0.1, 0.2, ...], metadata: { title: "Intro" } },
]);

// Dynamic query handler
vector.onQuery("docs", (query) => {
  return [{ id: "result", score: 1.0, metadata: { topK: query.topK } }];
});

// Standalone or mounted
const url = await vector.start();
// Or: llm.mount("/vector", vector);

VectorMock endpoints

Provider	Endpoints
Pinecone	`POST /query`, `POST /vectors/upsert`, `POST /vectors/delete`, `GET /describe-index-stats`
Qdrant	`POST /collections/{name}/points/search`, `PUT /collections/{name}/points`, `POST /collections/{name}/points/delete`
ChromaDB	`POST /api/v1/collections/{id}/query`, `POST /api/v1/collections/{id}/add`, `GET /api/v1/collections`, `DELETE /api/v1/collections/{id}`

Service Mocks (Search / Rerank / Moderation)

Built-in mocks for common AI-adjacent services. Registered on the LLMock instance directly — no separate server needed.

Search (Tavily-compatible)

// POST /search — matches request `query` field
mock.onSearch("weather", [
  { title: "Weather Report", url: "https://example.com", content: "Sunny today" },
]);
mock.onSearch(/stock\s+price/i, [
  { title: "ACME Stock", url: "https://example.com", content: "$42", score: 0.95 },
]);

Rerank (Cohere-compatible)

// POST /v2/rerank — matches request `query` field
mock.onRerank("machine learning", [
  { index: 0, relevance_score: 0.99 },
  { index: 2, relevance_score: 0.85 },
]);

Moderation (OpenAI-compatible)

// POST /v1/moderations — matches request `input` field
mock.onModerate("violent", {
  flagged: true,
  categories: { violence: true, hate: false },
  category_scores: { violence: 0.95, hate: 0.01 },
});

// Catch-all — everything passes
mock.onModerate(/.*/, { flagged: false, categories: {} });

Pattern matching

All three services use the same matching logic:

String patterns — case-insensitive substring match
RegExp patterns — full regex test
First match wins — register specific patterns before catch-alls

Debugging Fixture Mismatches

When a fixture doesn't match:

Inspect what the server received: mock.getLastRequest() → check body.messages array
Check fixture order: mock.getFixtures() returns fixtures in registration order
For userMessage: match is against the LAST role: "user" message only, substring match (not exact)
Check the journal: mock.getRequests() shows all requests including which fixture matched (or null for 404)

E2E Test Setup Pattern

import { LLMock } from "@copilotkit/aimock";

// Setup — port: 0 picks a random available port
const mock = new LLMock({ port: 0 });
mock.loadFixtureDir("./fixtures");
await mock.start();
process.env.OPENAI_BASE_URL = `${mock.url}/v1`;

// Per-test cleanup
afterEach(() => mock.reset()); // clears fixtures AND journal

// Teardown
afterAll(async () => await mock.stop());

Static factory shorthand

const mock = await LLMock.create({ port: 0 }); // creates + starts in one call

API Quick Reference

Method	Purpose
`addFixture(f)`	Append fixture (last priority)
`addFixtures(f[])`	Append multiple
`prependFixture(f)`	Insert at front (highest priority)
`clearFixtures()`	Remove all fixtures
`getFixtures()`	Read current fixture list
`on(match, response, opts?)`	Shorthand for `addFixture`
`onMessage(pattern, response, opts?)`	Match by user message
`onEmbedding(pattern, response, opts?)`	Match by embedding input text
`onJsonOutput(pattern, json, opts?)`	Match by user message with `responseFormat`
`onToolCall(name, response, opts?)`	Match by tool name in `tools[]`
`onToolResult(id, response, opts?)`	Match by `tool_call_id`
`nextRequestError(status, body?)`	One-shot error, auto-removes
`loadFixtureFile(path)`	Load JSON fixture file
`loadFixtureDir(path)`	Load all JSON files in directory
`start()`	Start server, returns URL
`stop()`	Stop server
`reset()`	Clear fixtures + journal + match counts
`resetMatchCounts()`	Clear sequence match counts only
`getRequests()`	All journal entries
`getLastRequest()`	Most recent journal entry
`clearRequests()`	Clear journal only
`setChaos(opts)`	Set server-level chaos rates
`clearChaos()`	Remove server-level chaos
`onSearch(pattern, results)`	Match search requests by query
`onRerank(pattern, results)`	Match rerank requests by query
`onModerate(pattern, result)`	Match moderation requests by input
`onImage(pattern, response)`	Match image generation by prompt
`onSpeech(pattern, response)`	Match TTS by input text
`onTranscription(match, response)`	Match audio transcription
`onVideo(pattern, response)`	Match video generation by prompt
`mount(path, handler)`	Mount a Mountable (VectorMock, etc.)
`url` / `baseUrl`	Server URL (throws if not started)
`port`	Server port number

Sequential responses use on() with sequenceIndex in the match — there is no dedicated convenience method.

Record-and-Replay (VCR Mode)

aimock supports a VCR-style record-and-replay workflow for ALL endpoints including multimedia (image, TTS, transcription, video): unmatched requests are proxied to real provider APIs, and the responses are saved as standard aimock fixture files for deterministic replay. Binary TTS responses are base64-encoded with format derived from Content-Type. Multimedia fixtures automatically include endpoint in their match criteria for correct routing on replay.

CLI usage

# Record mode: proxy unmatched requests to real OpenAI and Anthropic APIs
aimock --record \
  --provider-openai https://api.openai.com \
  --provider-anthropic https://api.anthropic.com \
  -f ./fixtures

# Strict mode: fail on unmatched requests (no proxying, no catch-all 404)
aimock --strict -f ./fixtures

--record enables proxy-on-miss. Requires at least one --provider-* flag.
--strict returns a 503 error when no fixture matches AND no proxy is configured (or the proxy attempt fails), instead of silently returning a 404. The proxy is still tried first when --record is set. Use this in CI to prevent unmatched requests from slipping through as silent 404s.
Provider flags: --provider-openai, --provider-anthropic, --provider-gemini, --provider-vertexai, --provider-bedrock, --provider-azure, --provider-ollama, --provider-cohere.

How it works

Existing fixtures are served first — the router checks all loaded fixtures before considering the proxy.
Misses are proxied — if no fixture matches and recording is enabled, the request is forwarded to the real provider API. Upstream URL path prefixes are preserved (e.g., https://gateway.company.com/llm/v1 correctly proxies to /llm/v1/chat/completions).
All request headers are forwarded (auth headers NOT saved) — all client request headers are passed through to the upstream provider, except hop-by-hop headers and host/content-length/cookie/accept-encoding. Auth headers (Authorization, x-api-key, api-key) are forwarded but stripped from the recorded fixture.
Responses are saved as standard fixtures — recorded files land in {fixturePath}/recorded/ and use the same JSON format as hand-written fixtures. Nothing special about them.
Streaming responses are collapsed — SSE streams are collapsed into a single text or tool-call response for the fixture. The original streaming format is preserved in the live proxy response.
Base64 embedding decoding — when the upstream returns base64-encoded embeddings (the default encoding_format in Python's openai SDK), the recorder decodes them into float arrays so fixtures contain readable numeric data instead of opaque base64 strings.
Loud logging — every proxy hit logs at warn level so you can see exactly which requests are being forwarded.

Programmatic API

const mock = new LLMock({ port: 0 });
await mock.start();

// Enable recording at runtime
mock.enableRecording({
  providers: {
    openai: "https://api.openai.com",
    anthropic: "https://api.anthropic.com",
  },
  fixturePath: "./fixtures/recorded",
});

// ... run tests that hit real APIs for uncovered cases ...

// Disable recording (back to fixture-only mode)
mock.disableRecording();

Workflow

Bootstrap: Run your test suite with --record and provider URLs. All requests that don't match existing fixtures are proxied and recorded.
Review: Check the recorded fixtures in {fixturePath}/recorded/. Edit or reorganize as needed.
Lock down: Run your test suite with --strict to ensure every request hits a fixture. No network calls escape.
Maintain: When APIs change, delete stale fixtures and re-record.

write-fixtures