gcp-agent-eval-engine-runner

This skill provides the "engine" for your automated evaluation pipeline. Grounded in evaluation_blog.md, it handles the complexity of running hundreds of parallel requests against a shadow revision while capturing the full "Thinking Process" (Reasoning Trace).

Usage

Ask Antigravity to:

"Create an evaluation runner script for my agent"
"Implement parallel inference for my golden dataset"
"Capture SSE traces for tool trajectory evaluation"

Engine Pattern

Parallel Inference: Uses asyncio.Semaphore to throttle requests (preventing DDOS of the shadow service).
SSE Capture: Connects to the ADK POST /run_sse endpoint to stream intermediate events.
Dataset Enrichment: Appends response and intermediate_events to the input dataset.
Vertex AI Integration: Submits the enriched dataset to the create_evaluation_run API.

Python Boilerplate

Refer to scripts/evaluate_agent_boilerplate.py for the core implementation.

gcp-agent-eval-engine-runner

gcp-agent-eval-engine-runner

Usage

Engine Pattern

Python Boilerplate

More from googlecloudplatform/devrel-demos

go-project-setup

video-description

agent-containerizer

gcp-agent-shadow-deployer

cloud-run-agent-architect

gcp-agent-sdp-template-factory