devils-advocate

Pass

Audited by Gen Agent Trust Hub on Mar 4, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill processes external research papers and arguments as part of its primary workflow, creating an attack surface for indirect prompt injection where malicious content in those documents could attempt to influence the agent's behavior.
  • Ingestion points: Step 1 of the workflow in SKILL.md ('Read the paper/argument being evaluated').
  • Boundary markers: No specific delimiters or safety instructions are defined to separate user-provided content from the agent's task instructions.
  • Capability inventory: The skill generates adversarial critiques and structured reports; the 'Council Mode' also involves executing a command-line tool.
  • Sanitization: The skill lacks explicit sanitization or validation of the ingested text content.
  • [COMMAND_EXECUTION]: The 'Council Mode' described in SKILL.md involves executing a command-line operation using the uv tool to run a local Python module (uv run python -m cli_council). This command utilizes temporary files in /tmp/ for prompt and context storage. As this appears to be a resource provided by the author (flonat), it is documented as a functional capability rather than a malicious vector.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 4, 2026, 07:17 PM