codex-autoresearch-loop

Fail

Audited by Gen Agent Trust Hub on Mar 21, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill's primary function is to execute arbitrary shell commands (e.g., 'pytest', 'npm test', 'bash scripts/...') inferred from the project or provided by the user. These commands run in an unattended loop without human verification between iterations.
  • [REMOTE_CODE_EXECUTION]: By autonomously generating and running commands based on codebase inference, the skill presents a significant risk where malicious project files could influence the agent to execute dangerous code.
  • [EXTERNAL_DOWNLOADS]: The documentation instructs users to install the skill via 'git clone' from a third-party repository ('github.com/leo-lilinxiao/codex-autoresearch.git') not associated with the declared author or any trusted organization.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it reads untrusted repository data to 'infer everything' needed for its execution loop.
  • Ingestion points: The skill scans the entire project directory (src/**/*.ts, etc.) to establish metrics and commands.
  • Boundary markers: No boundary markers or instructions to ignore embedded commands are present in the scanning logic.
  • Capability inventory: The agent can modify local files, commit to git, and execute arbitrary shell commands via its 'Verify' and 'Guard' phases.
  • Sanitization: There is no evidence of command sanitization or validation before the inferred shell commands are passed to the system shell.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 21, 2026, 05:24 AM