local-benchmarks

Pass

Audited by Gen Agent Trust Hub on May 5, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill makes extensive use of npx whatcanirun, which downloads and executes the benchmarking tool directly from the NPM registry.
  • [COMMAND_EXECUTION]: Shell commands including ls, find, unzip, and python3 are used to discover local models, inspect metadata, and parse benchmark result files.
  • [DATA_EXFILTRATION]: The skill provides an explicit workflow for users to 'submit' their benchmark results to the external service whatcani.run. This includes hardware specifications, model metadata, and performance metrics.
  • [REMOTE_CODE_EXECUTION]: While npx technically executes remote code, in this context, it is used to run the vendor's primary tool for the skill's stated purpose.
  • [INDIRECT_PROMPT_INJECTION]: The skill ingests untrusted data in the form of local file paths and JSON metadata from ZIP bundles.
  • Ingestion points: SKILL.md (via find and ls on model directories and reading manifest.json from ZIP bundles).
  • Boundary markers: None identified in the provided instructions.
  • Capability inventory: Shell command execution (npx, unzip, find, python3).
  • Sanitization: Not explicitly implemented in the provided shell scripts.
Audit Metadata
Risk Level
SAFE
Analyzed
May 5, 2026, 08:48 AM