eval-driven-dev

Pass

Audited by Gen Agent Trust Hub on Mar 16, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: Executes shell commands for package installation (pip install pixie-qa) and running tests (pixie test).
  • [EXTERNAL_DOWNLOADS]: Fetches the 'pixie-qa' package from the Python Package Index (PyPI) during the setup phase.
  • [DATA_EXFILTRATION]: Accesses environment variables for sensitive API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY) to verify the environment configuration.
  • [PROMPT_INJECTION]: Vulnerable to indirect prompt injection through the ingestion of untrusted application outputs for evaluation. Ingestion points: pixie_qa/datasets/ containing eval_output. Boundary markers: None. Capability inventory: pixie test executes Python test files. Sanitization: None.
  • [COMMAND_EXECUTION]: Dynamically generates and executes Python scripts (build_dataset.py, test_*.py) for the evaluation workflow.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 16, 2026, 01:02 AM