behavioral-evals

Installation

SKILL.md

Behavioral Evals

Overview

Behavioral evaluations (evals) are tests that validate the agent's decision-making (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions.

[!NOTE] Single Source of Truth: For core concepts, policies, running tests, and general best practices, always refer to evals/README.md.

🔄 Workflow Decision Tree

Installs

219

Repository

google-gemini/gemini-cli

GitHub Stars

105.7K

First Seen

Mar 24, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

behavioral-evals — google-gemini/gemini-cli