agent-benchmark-suite

Pass

Audited by Gen Agent Trust Hub on Mar 1, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill invokes the claude-flow utility through shell commands to perform benchmarking, comparison, and analysis.
  • [EXTERNAL_DOWNLOADS]: The use of npx results in the download and execution of the claude-flow package from the public NPM registry at runtime.
  • [PROMPT_INJECTION]: The skill exhibits a surface for indirect prompt injection.
  • Ingestion points: The skill ingests data from external files specified by the --results, --criteria, and --logs arguments.
  • Boundary markers: There are no explicit delimiters or safety instructions provided to differentiate data from instructions within the processed files.
  • Capability inventory: The skill has the capability to execute external commands via the shell using npx.
  • Sanitization: There is no evidence of content validation or sanitization for the files being analyzed.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 1, 2026, 04:32 PM