eval-driven-dev

Pass

Audited by Gen Agent Trust Hub on Mar 13, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill requires the installation of the pixie-qa package from the Python Package Index (PyPI), which is the standard repository for Python software. Reference to this tool is consistent with the skill's primary function. \n- [COMMAND_EXECUTION]: The skill guides the agent to execute shell commands for environment configuration and testing, specifically pip install for dependency management and pixie-test for running test suites. \n- [REMOTE_CODE_EXECUTION]: The pixie-test utility dynamically executes Python files (test scripts) from the local project directory to conduct quality evaluations on the instrumented application code. This dynamic execution is local and essential for the skill's testing capabilities. \n- [PROMPT_INJECTION]: The skill's workflow involves reading and executing user-provided code and data (e.g., project source files and JSON datasets), which represents an indirect prompt injection surface. This surface is documented through the ingestion of project files like chatbot.py and qa-golden-set.json. While capability inventory includes code execution via pixie-test, this is an inherent property of tools designed for codebase evaluation, and the risk is localized to the user's project environment.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 13, 2026, 02:11 PM