thought-based-reasoning
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
- [Remote Code Execution] (HIGH): The 'Program-Aided Language (PAL)' technique explicitly instructs the agent to 'Generate code, execute it'. This creates a direct path for executing arbitrary code if the user-provided problem statement is crafted to inject malicious instructions into the generated code block.
- [Indirect Prompt Injection] (HIGH): The skill's core function is to process external 'Problem statements' using high-privilege capabilities. It lacks explicit boundary markers or sanitization logic, making it vulnerable to indirect prompt injection where an attacker could embed malicious instructions within a complex math or logic puzzle.
- [Command Execution] (MEDIUM): The 'ReAct' pattern involves taking 'Actions' like 'Search/Lookup'. Depending on the tools available to the agent, this can lead to unauthorized external communication or local command execution triggered by malicious input data.
Recommendations
- AI detected serious security threats
Audit Metadata