evidence-heavy-evaluator
Fail
Audited by Gen Agent Trust Hub on Mar 9, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION]: The
scripts/collect_evidence.shscript executes arbitrary lifecycle scripts (lint,typecheck,test,build,check) found in thepackage.jsonof the repository being evaluated. This execution is triggered when the--execute-checksflag is used. - [REMOTE_CODE_EXECUTION]: The collector script invokes
uv runon a Python file (render_report.py) located within the path of the target repository. This allows an attacker who controls the target repository to execute arbitrary Python code on the system by modifying the contents of that script. - [DATA_EXFILTRATION]: The skill performs an exhaustive scan of the target directory using
rg --files -uu, which explicitly includes hidden and ignored files. The resulting file inventory and collected evidence are stored in evaluation artifacts that may contain sensitive secrets or private information present in the repo.
Recommendations
- AI detected serious security threats
Audit Metadata