gcp-agent-tool-trajectory-evaluator
gcp-agent-tool-trajectory-evaluator
This skill provides the specialized Python logic needed to evaluate how an agent uses its tools. Grounded in evaluation_blog.md, it moves beyond "Did the tool run?" to "Were the tools used correctly and efficiently?"
Usage
Ask Antigravity to:
- "Implement Trajectory Precision and Recall metrics"
- "Set up an Order Match metric for my multi-step agent"
- "Add a custom trajectory scorer to my Vertex AI evaluation"
Metric Definitions
- Trajectory Precision: Measures what percentage of called tools were actually specified in the reference.
- Trajectory Recall: Measures what percentage of required tools were successfully called by the agent.
- In-Order Match: Checks if the required tools were called in the correct sequence (even if other non-essential tools were called in between).
implementation Pattern
Refer to scripts/trajectory_metrics.py. These functions are designed to be serialized and passed to Vertex AI via CustomCodeExecutionSpec.
More from googlecloudplatform/devrel-demos
go-backend-dev
Specialist in implementing robust HTTP services and APIs in Go. Activates for "endpoint", "handler", "API", "server".
41go-reviewer
Expert code reviewer focusing on idiomatic Go, concurrency safety, and clean code principles. Activates for "review", "idiomatic", "refactor".
41go-architect
Expert in Go project scaffolding, standard layout compliance, and dependency management. Activates for "new project", "structure", "layout".
36go-test-expert
Expert in Go testing patterns, table-driven tests, httptest, benchmarking, and fuzzing. Activates for "test", "fail", "benchmark", "debug", "fuzz".
35latest-software-version
>
34go-project-setup
>
26