ablation-planner
Pass
Audited by Gen Agent Trust Hub on Apr 7, 2026
Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill requests
Bash(*)permissions to implement and run ablation experiments. It involves dynamically generating and executing shell scripts based on a plan produced by an LLM, which is a powerful capability that requires user oversight.\n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. It reads various project files and incorporates their raw content into a prompt for a reasoning model (Codex) to design experiments.\n - Ingestion points: Reads
docs/research_contract.md,EXPERIMENT_LOG.md,EXPERIMENT_TRACKER.md, and other project-specific documentation.\n - Boundary markers: The Codex prompt uses specific headers (e.g.,
Method:,Components:,Claims:) to separate input fields, providing minimal structural separation.\n - Capability inventory: The skill has access to tools like
Bash(*),Write, andEdit, enabling it to execute commands or modify the project environment.\n - Sanitization: There is no evidence of sanitization, validation, or filtering for the data read from project files before it is processed.
Audit Metadata