codex-readiness-unit-test

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (MEDIUM): The skill's 'Execute' mode is designed to run arbitrary shell commands found in AGENTS.md and PLANS.md. Although SKILL.md mandates a confirmation step and mentions a command denylist, the underlying mechanism executes untrusted input from the filesystem.
  • PROMPT_INJECTION (LOW): The skill is highly susceptible to Indirect Prompt Injection (Category 8). It ingests untrusted markdown files from the repository and uses them to influence both the LLM's evaluation results and the generated execution plan.
  • Ingestion points: AGENTS.md, PLANS.md, and any SKILL.md files referenced via $SkillName or path patterns (processed in scripts/collect_evidence.py).
  • Boundary markers: None. The evaluation prompts (e.g., references/commands.md, references/loop_quality.md) interpolate the raw EVIDENCE_JSON directly into the system instructions.
  • Capability inventory: The skill has the ability to read files across the current directory and the user's home directory (~/.codex/skills), and execute arbitrary shell commands via the referenced (but missing from source) run_plan.py script.
  • Sanitization: There is no evidence of sanitization or escaping of the ingested text before it is used in prompts or execution plans.
  • DATA_EXFILTRATION (LOW): The scripts/collect_evidence.py script accesses paths outside the current working directory, specifically ~/.codex/skills, to resolve skill references. While likely intended for tool configuration, this grants the skill access to data in the user's home directory.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 05:05 PM