skills/mlflow/skills/agent-evaluation/Gen Agent Trust Hub

agent-evaluation

Pass

Audited by Gen Agent Trust Hub on Mar 31, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill utilizes subprocess.run() across several utility scripts (e.g., create_dataset_template.py, setup_mlflow.py, validate_environment.py) to interface with the MLflow CLI, Databricks CLI, and to perform environment diagnostics. These operations are scoped to local configuration discovery and are necessary for the functional integration with the user's infrastructure.
  • [REMOTE_CODE_EXECUTION]: The skill uses importlib.import_module() in validate_tracing_runtime.py and run_evaluation_template.py to dynamically load and test the agent's entry point function. This is an expected pattern for evaluation frameworks that must interact with user-defined code at runtime.
  • [COMMAND_EXECUTION]: The skill employs a templating approach where it generates and writes customized Python scripts to the local filesystem (e.g., run_agent_evaluation.py). These scripts are intended to be reviewed and executed by the user to automate the evaluation workflow.
  • [DATA_EXPOSURE]: The documentation and validation scripts facilitate the management of necessary credentials such as MLFLOW_TRACKING_URI, DATABRICKS_TOKEN, and OPENAI_API_KEY. The skill promotes safe practices by recommending the use of .env files for secret management and includes validation scripts (validate_auth.py) to verify credential validity locally before performing operations.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 31, 2026, 11:53 PM