NYC

performance-benchmark-specialist

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (SAFE): The skill facilitates the execution of shell commands for the purpose of performance timing. This is the primary and stated intent of the skill and no suspicious or hidden execution patterns were detected.\n- [DATA_EXFILTRATION] (SAFE): No unauthorized data access or network communication patterns were identified. Result storage is conducted via local CSV files as described in the README.md.\n- [PROMPT_INJECTION] (LOW): The skill presents an indirect prompt injection surface by allowing for the execution of arbitrary commands within the benchmarking framework without visible sanitization.\n
  • Ingestion points: Command strings and scale parameters provided to the bench_run and bench_create_workspace functions (README.md).\n
  • Boundary markers: Absent; there are no specified delimiters or instructions to ignore embedded commands within the data being processed.\n
  • Capability inventory: The skill possesses the capability to execute shell commands (bench_run) and modify the filesystem for workspace creation (bench_create_workspace).\n
  • Sanitization: The documentation does not describe any methods for validating, escaping, or filtering the input provided to the benchmarking utilities.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 04:33 PM