agent-evaluation
Pass
Audited by Gen Agent Trust Hub on Mar 31, 2026
Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The skill utilizes
subprocess.run()across several utility scripts (e.g.,create_dataset_template.py,setup_mlflow.py,validate_environment.py) to interface with the MLflow CLI, Databricks CLI, and to perform environment diagnostics. These operations are scoped to local configuration discovery and are necessary for the functional integration with the user's infrastructure. - [REMOTE_CODE_EXECUTION]: The skill uses
importlib.import_module()invalidate_tracing_runtime.pyandrun_evaluation_template.pyto dynamically load and test the agent's entry point function. This is an expected pattern for evaluation frameworks that must interact with user-defined code at runtime. - [COMMAND_EXECUTION]: The skill employs a templating approach where it generates and writes customized Python scripts to the local filesystem (e.g.,
run_agent_evaluation.py). These scripts are intended to be reviewed and executed by the user to automate the evaluation workflow. - [DATA_EXPOSURE]: The documentation and validation scripts facilitate the management of necessary credentials such as
MLFLOW_TRACKING_URI,DATABRICKS_TOKEN, andOPENAI_API_KEY. The skill promotes safe practices by recommending the use of.envfiles for secret management and includes validation scripts (validate_auth.py) to verify credential validity locally before performing operations.
Audit Metadata