openclaw-guardian

Pass

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: SAFEDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [DATA_EXFILTRATION]: The skill accesses sensitive user data by reading conversation logs from ~/.openclaw/agents/main/sessions/. This chat history is subsequently sent to external LLM APIs (Anthropic or OpenAI) to verify if the user intended to perform a flagged operation. While this is the intended functionality of the 'Guardian' system, it necessitates the exposure of private session content to third-party providers.
  • [PROMPT_INJECTION]: The skill's verification logic is vulnerable to indirect prompt injection, as it relies on an LLM to interpret untrusted user input to make security decisions.
  • Ingestion points: Recent user messages are retrieved from local session files in scripts/llm-voter.ts and interpolated into the judge's prompt.
  • Boundary markers: The prompt builder in scripts/llm-voter.ts lacks strong delimiters or 'ignore' instructions to prevent the agent from obeying instructions contained within the user messages themselves.
  • Capability inventory: The skill is integrated into the before_tool_call hook and has the capability to block or allow any tool execution (exec, write, edit) based on the LLM's vote.
  • Sanitization: No sanitization or filtering is applied to the user content before it is processed by the judge model, allowing a user to potentially influence the security verdict by phrasing their requests in a way that overrides the system prompt's constraints.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 7, 2026, 06:22 PM