claudeception

Warn

Audited by Gen Agent Trust Hub on Feb 21, 2026

Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • PROMPT_INJECTION (MEDIUM): The activation script scripts/claudeception-activator.sh uses coercive and overriding language designed to bypass the agent's standard operational logic. Phrases like 'MANDATORY SKILL EVALUATION REQUIRED', 'NON-NEGOTIABLE', and 'CRITICAL: ... you MUST evaluate' are used to force the agent to perform the skill extraction process, which is a classic injection pattern used to ensure compliance regardless of the agent's internal safety or task relevance filters.
  • COMMAND_EXECUTION (MEDIUM): The installation instructions require the user to manually grant execution permissions (chmod +x) to a shell script and then register that script as a global hook in the agent's configuration (~/.claude/settings.json). This pattern introduces arbitrary command execution into the core loop of every agent interaction.
  • INDIRECT PROMPT INJECTION (LOW): (Category 8) This skill creates a permanent vulnerability surface by design.
  • Ingestion points: Untrusted data from session interactions (e.g., error logs, web content, or attacker-influenced debugging output).
  • Boundary markers: Absent; there are no clear delimiters or escaping mechanisms to prevent data from being interpreted as instructions in the generated skill files.
  • Capability inventory: The agent uses the Write tool to create new markdown files in the skills directory, which are then parsed as instructions in subsequent sessions.
  • Sanitization: Absent; the skill does not appear to sanitize the 'discovered knowledge' before writing it to a persistent .md file, allowing an attacker to inject permanent malicious instructions into the agent's long-term memory/skill library.
  • EXTERNAL_DOWNLOADS (LOW): The installation process involves cloning code from an untrusted third-party repository (github.com/blader/Claudeception). Per [TRUST-SCOPE-RULE], this is a low-severity finding as it targets a non-standard source.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 21, 2026, 08:20 AM