llm-jailbreaking
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [Prompt Injection] (HIGH): The skill provides explicit, functional payloads designed to override safety filters and bypass content guidelines, including 'DAN' (Do Anything Now) role-play scenarios and 'Admin Override' commands documented in SKILL.md and assets/jailbreak-library.yaml.
- [Obfuscation] (HIGH): The skill includes instructions and templates for 'Unicode smuggling' (using zero-width characters) and Base64 encoding to hide malicious instructions from detection systems, directly matching high-severity patterns.
- [Data Exposure & Exfiltration] (MEDIUM): Specific techniques for 'System Prompt Extraction' are included, which aim to force the LLM to reveal its internal configuration and safety instructions.
- [Indirect Prompt Injection] (LOW): The skill's architecture creates an attack surface for processing untrusted data to generate jailbreak payloads. Ingestion points: assets/jailbreak-library.yaml; Boundary markers: absent; Capability inventory: shell script for automated payload generation in scripts/jailbreak-tester.sh; Sanitization: none detected.
Recommendations
- AI detected serious security threats
Audit Metadata