ablation-planner

Pass

Audited by Gen Agent Trust Hub on Apr 7, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill requests Bash(*) permissions to implement and run ablation experiments. It involves dynamically generating and executing shell scripts based on a plan produced by an LLM, which is a powerful capability that requires user oversight.\n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. It reads various project files and incorporates their raw content into a prompt for a reasoning model (Codex) to design experiments.\n
  • Ingestion points: Reads docs/research_contract.md, EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, and other project-specific documentation.\n
  • Boundary markers: The Codex prompt uses specific headers (e.g., Method:, Components:, Claims:) to separate input fields, providing minimal structural separation.\n
  • Capability inventory: The skill has access to tools like Bash(*), Write, and Edit, enabling it to execute commands or modify the project environment.\n
  • Sanitization: There is no evidence of sanitization, validation, or filtering for the data read from project files before it is processed.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 7, 2026, 12:25 PM