The Agent Skills Directory

[DATA_EXFILTRATION]: The skill accesses sensitive user data by reading conversation logs from ~/.openclaw/agents/main/sessions/. This chat history is subsequently sent to external LLM APIs (Anthropic or OpenAI) to verify if the user intended to perform a flagged operation. While this is the intended functionality of the 'Guardian' system, it necessitates the exposure of private session content to third-party providers.
[PROMPT_INJECTION]: The skill's verification logic is vulnerable to indirect prompt injection, as it relies on an LLM to interpret untrusted user input to make security decisions.
Ingestion points: Recent user messages are retrieved from local session files in scripts/llm-voter.ts and interpolated into the judge's prompt.
Boundary markers: The prompt builder in scripts/llm-voter.ts lacks strong delimiters or 'ignore' instructions to prevent the agent from obeying instructions contained within the user messages themselves.
Capability inventory: The skill is integrated into the before_tool_call hook and has the capability to block or allow any tool execution (exec, write, edit) based on the LLM's vote.
Sanitization: No sanitization or filtering is applied to the user content before it is processed by the judge model, allowing a user to potentially influence the security verdict by phrasing their requests in a way that overrides the system prompt's constraints.

openclaw-guardian