langgraph
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
- REMOTE_CODE_EXECUTION (HIGH): The example code for the calculator tool uses Python's
eval()function on a string that is intended to be generated by an LLM based on user input. - Evidence: In
SKILL.md, the code block shows:@tool def calculator(expression: str) -> str: """Evaluate a math expression.""" return str(eval(expression)) - Risk: An attacker can use prompt injection (direct or indirect) to force the LLM to generate a malicious string such as
__import__('os').system('rm -rf /')or credential exfiltration commands, whicheval()will execute with the privileges of the agent process. - INDIRECT_PROMPT_INJECTION (LOW): The skill describes a pattern where an agent uses a
searchtool to fetch web content and then processes that content in the next loop, creating an attack surface for indirect injection. - Ingestion points:
searchtool output (inagentnode withinSKILL.md). - Boundary markers: Absent. The state just appends messages using
add_messageswithout delimiters or 'ignore' instructions. - Capability inventory: The agent has access to
eval()via the calculator tool and likely filesystem/network access depending on the environment. - Sanitization: No evidence of input validation or sanitization before passing tool outputs back to the LLM.
Recommendations
- AI detected serious security threats
Audit Metadata