codex-readiness-integration-test
Warn
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION] (MEDIUM): The
scripts/run_plan.pyutility executes shell commands derived from a JSON plan usingsubprocess.Popen(shell=True). These commands are generated by an LLM based on the repository context. - Evidence:
scripts/run_plan.py(lines 112-121) executes arbitrary strings in a shell environment. - Mitigation: A regex-based denylist (
DENYLIST_PATTERNS) is implemented to block destructive commands likerm -rfandmkfs. Additionally,SKILL.mdspecifies a workflow requirement where users must manually approve the prompt and plan before execution. - [PROMPT_INJECTION] (LOW): The skill is susceptible to indirect prompt injection. It ingests data from the local repository (such as
AGENTS.mdand log files) which is then processed by LLM evaluators to determine test success or failure. - Mandatory Evidence Chain (Category 8):
- Ingestion points:
scripts/collect_evidence.pyreadsAGENTS.mdandlogs/*.logfrom the current working directory. - Boundary markers: No explicit delimiters or instructions to ignore embedded commands are present in the evaluator prompts (
references/agentic_loop_eval.md,references/change_quality.md). - Capability inventory: The skill can execute shell commands via
scripts/run_plan.pyand thecodexCLI, and it can read/write files within the repository scope. - Sanitization: Command execution is restricted by a basic denylist in
scripts/run_plan.py, but there is no sanitization of the content ingested into the LLM prompts. - [DATA_EXFILTRATION] (LOW): The skill collects comprehensive repository state information, including git diffs (including untracked files), directory structures, and logs, into a single
evidence.jsonfile. While no direct network exfiltration was found in the scripts, this file aggregates sensitive local data for LLM processing. - Evidence:
scripts/collect_evidence.pyusesgit diffandgit ls-files --others --exclude-standardto gather repository content.
Audit Metadata