prompt-injection-guard
Pass
Audited by Gen Agent Trust Hub on Mar 21, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: No malicious patterns detected. The skill is a defensive utility for the agent.
- [PROMPT_INJECTION]: The skill manages indirect prompt injection risks by scanning external data (web pages, emails, and shared documents) for malicious instructions. Ingestion points: Browser output, emails, file contents, shared documents, and tool call results as specified in
SKILL.md. Boundary markers: Step 1 of the detection protocol classifies content astrustedoruntrusted. Capability inventory: Local script execution (guard.py) and file writing to a local state directory (~/.openclaw/). Sanitization: Regex-based scanning against theSIGNALSlist defined inguard.pyis performed before content is processed by the agent. - [DATA_EXFILTRATION]: The skill logs scan results and 200-character content excerpts to a local file at
~/.openclaw/skill-state/prompt-injection-guard/state.yaml. While used for auditing and reputation tracking, this results in fragments of external data being stored on the local filesystem. - [COMMAND_EXECUTION]: The skill instructions and examples direct the agent to execute a local Python script (
guard.py) to analyze content, which is an intended and documented part of its defensive functionality.
Audit Metadata