agentic-eval
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [REMOTE_CODE_EXECUTION] (HIGH): The 'Code-Specific Reflection' pattern (Pattern 3) generates Python code and unit tests from an untrusted 'spec' and immediately executes them.
- Evidence: File
SKILL.mdcontainsresult = run_tests(code, tests). - Risk: If the input specification contains malicious instructions, the LLM may generate code that performs unauthorized system operations (e.g., file deletion or network access) which are then executed on the host system.
- [PROMPT_INJECTION] (HIGH): All provided prompt templates directly interpolate external, untrusted data into the LLM context without any boundary markers or sanitization.
- Evidence: Templates like
llm(f"Complete this task:\n{task}")andllm(f"Write Python code for: {spec}")allow user-provided strings to take control of the LLM's instructions. - Risk: An attacker can use 'ignore previous instructions' techniques within the
taskorspecvariables to bypass intended evaluation logic or force the generation of malicious payloads. - [INDIRECT PROMPT INJECTION] (HIGH): This skill is a primary target for Category 8 attacks because it combines untrusted data ingestion with high-privilege execution capabilities.
- Ingestion points: The variables
task,spec, andoutputin all three patterns. - Boundary markers: None. No delimiters (like XML tags or triple quotes) are used to separate instructions from data.
- Capability inventory: The skill possesses the ability to execute code via the
run_testsfunction and parse structured data viajson.loads. - Sanitization: None. Content is passed directly to the LLM and the execution environment.
- [COMMAND_EXECUTION] (MEDIUM): The patterns facilitate the execution of arbitrary commands by treating LLM-generated strings as executable code logic.
- Evidence: The
CodeReflectorclass automate a loop of writing, testing, and fixing code based on error messages, which can be manipulated into a persistent exploit loop.
Recommendations
- AI detected serious security threats
Audit Metadata