autoresearch
Warn
Audited by Gen Agent Trust Hub on Apr 10, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill is designed to autonomously create and execute shell scripts (
autoresearch.shandautoresearch.checks.sh). It also executes internal utility scripts (scripts/confidence.shandscripts/summary.sh) using the bash interpreter. The use ofgit clean -fdduring the revert cycle is a destructive operation that could remove untracked files. - [REMOTE_CODE_EXECUTION]: The core loop involves the agent editing source code and immediately executing it to establish benchmarks. The provided templates also encourage the use of package managers like
uvandpnpm, which may download and execute external dependencies from public registries based on the project's configuration. - [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection. It is instructed to read 'every file in scope' to 'understand the workload deeply'. Malicious instructions embedded within the source files of the project being researched could be interpreted as commands by the agent, potentially abusing its file-writing and shell-execution capabilities.
- Ingestion points: Reads all files defined in the 'Files in Scope' section of
autoresearch.md. - Boundary markers: None present; the agent is instructed to read files directly for deep understanding.
- Capability inventory: Includes file modification, git branch creation, git commits, and arbitrary shell command execution via the benchmark scripts.
- Sanitization: No sanitization or validation of the content of the files in scope is performed before processing.
Audit Metadata