langchain
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- Remote Code Execution (HIGH): The 'calculate' tool implementation in SKILL.md uses 'eval(expression)'. This allows an attacker to execute arbitrary Python code by providing a malicious expression to the agent, which the LLM will then pass to the tool. Examples include bypassing the intended math logic to run system commands via 'import("os").system()'.
- Indirect Prompt Injection (LOW): The skill implements RAG and database search patterns that ingest untrusted data from external sources.
- Ingestion points: 'retriever' in the RAG pipeline and 'search_database' tool in the Agent section.
- Boundary markers: Absent; untrusted data is directly interpolated into prompts without delimiters or instructions to ignore embedded commands.
- Capability inventory: The agent has access to the 'calculate' tool, which provides a high-impact 'eval' capability.
- Sanitization: No sanitization, validation, or escaping of external content is present in the code examples.
- Prompt Injection (LOW): 'ChatPromptTemplate' uses simple string interpolation for '{language}' and '{text}' without any safety instructions or markers to prevent the input data from overriding the system instructions.
Recommendations
- AI detected serious security threats
Audit Metadata