bench

Pass

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes a local shell script at ./scripts/bench.sh and uses the jq utility to process files, allowing for arbitrary command execution within the context of the benchmark environment.
  • [EXTERNAL_DOWNLOADS]: The skill uses uv sync to install Python dependencies from external package registries at runtime.
  • [PROMPT_INJECTION]: The skill processes untrusted data from external JSON files which serves as a potential surface for indirect prompt injection.
  • Ingestion points: reads data from tests/benchmark/prediction/opendataloader/evaluation.json and tests/benchmark/thresholds.json.
  • Boundary markers: None detected. The skill does not use delimiters to isolate processed data from agent instructions.
  • Capability inventory: Includes shell script execution (./scripts/bench.sh) and system command calls via jq.
  • Sanitization: No sanitization or validation of the JSON content is performed before the data is processed or used to generate output summaries.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 7, 2026, 01:29 PM