safe-destroy

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • Prompt Injection (HIGH): The skill utilizes high-pressure override language (e.g., 'MANDATORY', 'FORBIDDEN', 'VIOLATING YOUR CORE DIRECTIVE') to bypass standard agent autonomy and enforce its own protocols. These are classic prompt injection markers designed to dominate the agent's internal reasoning.
  • Indirect Prompt Injection (MEDIUM): The skill delegates its detailed logic to local markdown files at '~/.claude/skills/safe-destroy/references/'. 1. Ingestion point: Local reference files. 2. Boundary markers: Absent. 3. Capability inventory: Commands for git, file system (rm), and container management (docker). 4. Sanitization: None. This creates a surface where a local attacker could influence agent behavior by altering the documentation files the skill reads.
  • Command Execution (LOW): While intended as a safeguard, the skill explicitly provides and manages destructive system commands such as 'rm -rf', 'git reset --hard', and 'docker system prune'.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 12:39 PM