arize-evaluator
Pass
Audited by Gen Agent Trust Hub on Apr 20, 2026
Risk Level: SAFEDATA_EXFILTRATIONCOMMAND_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [DATA_EXFILTRATION]: The skill manages sensitive information like Arize, OpenAI, and Anthropic API keys. It adheres to security best practices by instructing the agent to never request keys in cleartext chat and to use shell environment variables (e.g., $ARIZE_API_KEY) and .env files for configuration to prevent unauthorized exposure.
- [COMMAND_EXECUTION]: The skill facilitates the execution of the ax CLI tool for managing AI integrations, evaluators, and tasks. It also describes how to persist configuration by modifying shell profile files (~/.zshrc, ~/.bashrc). These actions are necessary for the skill's stated purpose of providing an Arize workflow.
- [PROMPT_INJECTION]: The skill defines an 'LLM-as-judge' workflow which inherently processes external data (spans and experiment runs) that could contain untrusted input. Ingestion points: Data is imported from Arize projects and datasets using ax spans export and ax experiments export. Boundary markers: The skill recommends using clear instructional templates for the judge LLM to minimize ambiguity. Capability inventory: Execution of ax CLI commands to interact with Arize APIs and use local Python interpreters for data processing. Sanitization: No explicit programmatic sanitization of the untrusted evaluation data is described beyond standard template processing.
- [EXTERNAL_DOWNLOADS]: The skill documentation provides instructions for installing the arize-ax-cli from the official Python Package Index (PyPI) using tools like pip, pipx, or uv. This is a standard and expected procedure for using the vendor's official tooling.
Audit Metadata