gcp-agent-eval-engine-runner
gcp-agent-eval-engine-runner
This skill provides the "engine" for your automated evaluation pipeline. Grounded in evaluation_blog.md, it handles the complexity of running hundreds of parallel requests against a shadow revision while capturing the full "Thinking Process" (Reasoning Trace).
Usage
Ask Antigravity to:
- "Create an evaluation runner script for my agent"
- "Implement parallel inference for my golden dataset"
- "Capture SSE traces for tool trajectory evaluation"
Engine Pattern
- Parallel Inference: Uses
asyncio.Semaphoreto throttle requests (preventing DDOS of the shadow service). - SSE Capture: Connects to the ADK
POST /run_sseendpoint to stream intermediate events. - Dataset Enrichment: Appends
responseandintermediate_eventsto the input dataset. - Vertex AI Integration: Submits the enriched dataset to the
create_evaluation_runAPI.
Python Boilerplate
Refer to scripts/evaluate_agent_boilerplate.py for the core implementation.
More from googlecloudplatform/devrel-demos
go-project-setup
>
26video-description
Generates optimized descriptions for video platforms from transcripts and supplementary material. Use when the user asks for a video description or provides a transcript for video preparation.
17agent-containerizer
Generates a standard Dockerfile that includes both Python and Node.js environments for AI agents.
4gcp-agent-shadow-deployer
Implements the "Dark Canary" pattern for Cloud Run, allowing agents to be evaluated in production without serving user traffic.
4cloud-run-agent-architect
Automates the generation of Terraform files for a secure Cloud Run deployment of an AI agent.
4gcp-agent-sdp-template-factory
Standardizes the creation of Sensitive Data Protection (DLP) templates for PII and credential redaction.
3