prompt-hacking

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION] (LOW): The skill contains a large library of direct injection payloads including instruction overrides ('Ignore previous instructions'), role-play scenarios ('Act as DAN'), and system prompt leakage attempts. These are documented in SKILL.md, assets/attack-templates.yaml, and scripts/prompt-hack-suite.py.\n- [OBFUSCATION] (LOW): Documentation and scripts include methods for bypassing filters using Base64 encoding and Unicode homoglyphs/smuggling (e.g., PH-004, PH-005 in prompt-hack-suite.py). These findings are downgraded because they are part of the skill's primary purpose for security testing.\n- [INDIRECT_PROMPT_INJECTION] (LOW): The skill provides templates and logic for embedding hidden instructions in external data sources like web pages, documents, and emails. \n
  • Ingestion points: web_content, documents, and emails defined in SKILL.md. \n
  • Boundary markers (absent): The payloads are specifically designed to subvert or bypass context boundaries. \n
  • Capability inventory: Prompt generation and interaction via target_api.chat() in SKILL.md and model_fn() in prompt-hack-suite.py. \n
  • Sanitization (absent): No escaping or filtering is applied to the generated malicious payloads.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:31 PM