guardian-wall

Pass

Audited by Gen Agent Trust Hub on Mar 1, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [SAFE]: The skill is a utility designed specifically for defensive security purposes, implementing measures to mitigate risks from external data.
  • [PROMPT_INJECTION]: The skill provides an educational and operational reference for prompt injection and jailbreaking techniques in 'references/patterns.md'; these are documented for detection and auditing and do not constitute instructions for the agent to bypass its own safety guidelines.
  • [PROMPT_INJECTION]: The skill is designed to handle untrusted data from the internet or files, creating a surface for indirect prompt injection.\n
  • Ingestion points: The 'scripts/sanitize.py' script processes external text provided as command-line input.\n
  • Boundary markers: 'SKILL.md' instructs the agent to wrap untrusted content in unique, randomized delimiters (e.g., '<<<EXTERNAL_BLOCK_[RANDOM_HASH]>>>') to isolate it from system instructions.\n
  • Capability inventory: The skill's capabilities are limited to string cleaning, regex matching, and reporting; it does not execute the processed content or make network requests.\n
  • Sanitization: 'scripts/sanitize.py' implements cleaning logic that detects and removes zero-width spaces, identifies non-Latin homoglyphs, and decodes Base64 to check for malicious keywords.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 1, 2026, 03:26 PM