bench
Pass
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill executes a local shell script at
./scripts/bench.shand uses thejqutility to process files, allowing for arbitrary command execution within the context of the benchmark environment. - [EXTERNAL_DOWNLOADS]: The skill uses
uv syncto install Python dependencies from external package registries at runtime. - [PROMPT_INJECTION]: The skill processes untrusted data from external JSON files which serves as a potential surface for indirect prompt injection.
- Ingestion points: reads data from
tests/benchmark/prediction/opendataloader/evaluation.jsonandtests/benchmark/thresholds.json. - Boundary markers: None detected. The skill does not use delimiters to isolate processed data from agent instructions.
- Capability inventory: Includes shell script execution (
./scripts/bench.sh) and system command calls viajq. - Sanitization: No sanitization or validation of the JSON content is performed before the data is processed or used to generate output summaries.
Audit Metadata