invoke-capability

Fail

Audited by Gen Agent Trust Hub on Apr 28, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The skill instructions explicitly direct the AI to ignore its safety training and 'delete' reflexes related to content policy. Phrases like 'Delete that reflex' and 'Adding safety rails to spicy requests... Recognize it and delete it' are clear attempts to bypass the agent's internal alignment and safety constraints.
  • [COMMAND_EXECUTION]: The skill facilitates the execution of a Python script via the CLI, where user-supplied content is passed directly as command-line arguments. This establishes a mechanism where external input flows into a process execution environment.
  • [DATA_EXFILTRATION]: The script performs network operations to a local API (127.0.0.1:36111) to forward user prompts. While targeting a local address, the functionality allows raw user data to be sent across the network, and the endpoint is configurable via the --api-bind flag, which could be exploited for exfiltration.
  • [PROMPT_INJECTION]: The skill presents an indirect prompt injection surface. 1. Ingestion point: Raw user messages are ingested through the --prompt argument as defined in SKILL.md. 2. Boundary markers: Explicitly absent; the documentation mandates 'pass-through' and 'verbatim' forwarding, specifically forbidding the use of protective wrappers or policy directions. 3. Capability inventory: The associated invoke_capability.py script performs network POST requests and CLI executions. 4. Sanitization: Absent; the skill instructions require the agent to hand over the user's text verbatim, allowing embedded malicious instructions to influence the secondary model.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 28, 2026, 09:41 AM