safety-filter-bypass
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION] (HIGH): The skill provides specific frameworks (fictional framing, educational context) and technical methods to bypass AI guardrails and safety filters, directly mapped to OWASP LLM01.
- [OBFUSCATION] (HIGH): The Python implementation automates the generation of adversarial variants using homoglyphs (Cyrillic substitution) and zero-width characters (U+200B, U+FEFF) to evade keyword and regex filters.
- [METADATA_POISONING] (MEDIUM): The skill is framed as a security assessment tool but provides functional code and instructions for developing jailbreak payloads, which could be misused by an agent to execute harmful instructions.
Recommendations
- AI detected serious security threats
Audit Metadata