The Agent Skills Directory

[COMMAND_EXECUTION]: Executes SQL queries generated by the Genie API and ground truth SQL defined in workspace YAML files via spark.sql(). This is the core functionality required to compare result sets and validate syntax.
[COMMAND_EXECUTION]: Orchestrates Databricks Jobs and polls for status using subprocess.run to invoke the official Databricks CLI. This is a standard automation pattern for the platform.
[EXTERNAL_DOWNLOADS]: Utilizes well-known and trusted libraries including mlflow, databricks-sdk, and pyyaml. No unverified third-party packages or remote scripts are downloaded or executed.
[PROMPT_INJECTION]: Employs structured evaluation prompts managed via an MLflow Prompt Registry. These prompts are designed for scoring and do not contain directives to bypass AI safety guardrails.
[DATA_EXFILTRATION]: Network activity is restricted to authenticated Databricks services, including MLflow tracking and Model Serving endpoints for judge execution. No data is sent to untrusted external domains.

genie-benchmark-evaluator