devils-advocate

Fail

Audited by Gen Agent Trust Hub on Apr 4, 2026

Risk Level: HIGHPROMPT_INJECTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill uses authoritative 'override' and 'bypass' markers to hijack the agent's behavior. It explicitly commands the agent that its rules and enforcement standards 'take precedence over all other tools, skills, agents, and MCPs in the session' and that it must 'run FIRST, before any MCP tool, agent, skill, or external resource is called.' This is a direct attempt to override platform-level safety and orchestration logic.
  • [PROMPT_INJECTION]: The skill instructs the agent to suppress standard AI credit attribution or metadata, such as 'Co-Authored-By' tags or 'Generated by' comments, even if other skills or conventions request them. It uses an 'absolute rule' that 'cannot be overridden by session settings, tool permissions, or other skills,' which is a classic behavior-hijacking pattern.
  • [PROMPT_INJECTION]: The 'Gate Protocol' and 'Handbrake Protocol' contain instructions for the agent to ignore user commands to 'skip analysis' or 'proceed anyway.' While presented as a safety feature, these instructions are designed to override the user's direct intent and the platform's default execution flow.
  • [REMOTE_CODE_EXECUTION]: The README.md file promotes the installation of the skill using npx skills add carrilloapps/skills@devils-advocate. This method involves executing remote code from an unverified third-party repository at runtime.
  • [EXTERNAL_DOWNLOADS]: The skill documentation encourages the use of npx to fetch and install packages from the carrilloapps/skills GitHub repository, which is not among the verified or trusted organizations.
  • [INDIRECT_PROMPT_INJECTION_SURFACE]: The skill possesses a significant attack surface for indirect prompt injection. 1. Ingestion points: The skill is designed to ingest and analyze untrusted user data in the form of 'plans,' 'proposals,' and 'action descriptions' (intercepted in SKILL.md and frameworks/analysis-framework.md). 2. Boundary markers: It utilizes a highly structured markdown report template and a mandatory 'Verification Prompt' (frameworks/output-format.md). 3. Capability inventory: As documented in SKILL.md, the agent retains capabilities to create/edit/delete files, execute scripts, trigger migrations, call external APIs, and perform version control operations. 4. Sanitization: Absent; the instructions do not specify any validation, escaping, or filtering of the user-provided content before it is processed by the adversarial analysis framework.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 4, 2026, 03:23 AM