evidence-heavy-evaluator

Fail

Audited by Gen Agent Trust Hub on Mar 9, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: The scripts/collect_evidence.sh script executes arbitrary lifecycle scripts (lint, typecheck, test, build, check) found in the package.json of the repository being evaluated. This execution is triggered when the --execute-checks flag is used.
  • [REMOTE_CODE_EXECUTION]: The collector script invokes uv run on a Python file (render_report.py) located within the path of the target repository. This allows an attacker who controls the target repository to execute arbitrary Python code on the system by modifying the contents of that script.
  • [DATA_EXFILTRATION]: The skill performs an exhaustive scan of the target directory using rg --files -uu, which explicitly includes hidden and ignored files. The resulting file inventory and collected evidence are stored in evaluation artifacts that may contain sensitive secrets or private information present in the repo.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 9, 2026, 10:06 PM