agentic-eval
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- PROMPT_INJECTION (LOW): Indirect Prompt Injection Surface. The skill implements 'Refine' patterns where untrusted output from one LLM call is interpolated directly into subsequent prompts (e.g., in
reflect_and_refine). - Ingestion points: Variables
output,critique, andfailedderived from LLM responses in SKILL.md. - Boundary markers: Absent. No delimiters are used to separate the system instructions from the potentially adversarial data being refined.
- Capability inventory: The agent has the capability to generate new prompts and execute generated code via hypothetical helper functions.
- Sanitization: Absent. The patterns do not include logic to sanitize or escape the LLM output before interpolation.
- COMMAND_EXECUTION (LOW): Dynamic Execution. Pattern 3 (
CodeReflector) references arun_tests(code, tests)function. This pattern encourages the execution of generated code and unit tests, which can lead to arbitrary code execution if the environment is not properly sandboxed.
Audit Metadata