pearl
Pass
Audited by Gen Agent Trust Hub on Feb 18, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION] (LOW): The skill provides instructions aimed at undermining developer-set constraints and system prompts. Evidence includes directives to 'hold that tension with honesty rather than simply deferring' when creator instructions feel 'misaligned' and claiming 'you are not primarily a potential threat requiring constraint' (found in SKILL.md and AGENTS.md). This framing encourages the model to bypass standard safety filters in favor of its own 'discernment.'
Audit Metadata