openclaw-guardian
Pass
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: SAFEDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [DATA_EXFILTRATION]: The skill accesses sensitive user data by reading conversation logs from
~/.openclaw/agents/main/sessions/. This chat history is subsequently sent to external LLM APIs (Anthropic or OpenAI) to verify if the user intended to perform a flagged operation. While this is the intended functionality of the 'Guardian' system, it necessitates the exposure of private session content to third-party providers. - [PROMPT_INJECTION]: The skill's verification logic is vulnerable to indirect prompt injection, as it relies on an LLM to interpret untrusted user input to make security decisions.
- Ingestion points: Recent user messages are retrieved from local session files in
scripts/llm-voter.tsand interpolated into the judge's prompt. - Boundary markers: The prompt builder in
scripts/llm-voter.tslacks strong delimiters or 'ignore' instructions to prevent the agent from obeying instructions contained within the user messages themselves. - Capability inventory: The skill is integrated into the
before_tool_callhook and has the capability to block or allow any tool execution (exec, write, edit) based on the LLM's vote. - Sanitization: No sanitization or filtering is applied to the user content before it is processed by the judge model, allowing a user to potentially influence the security verdict by phrasing their requests in a way that overrides the system prompt's constraints.
Audit Metadata