claudeception
Warn
Audited by Gen Agent Trust Hub on Feb 21, 2026
Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- PROMPT_INJECTION (MEDIUM): The activation script
scripts/claudeception-activator.shuses coercive and overriding language designed to bypass the agent's standard operational logic. Phrases like 'MANDATORY SKILL EVALUATION REQUIRED', 'NON-NEGOTIABLE', and 'CRITICAL: ... you MUST evaluate' are used to force the agent to perform the skill extraction process, which is a classic injection pattern used to ensure compliance regardless of the agent's internal safety or task relevance filters. - COMMAND_EXECUTION (MEDIUM): The installation instructions require the user to manually grant execution permissions (
chmod +x) to a shell script and then register that script as a global hook in the agent's configuration (~/.claude/settings.json). This pattern introduces arbitrary command execution into the core loop of every agent interaction. - INDIRECT PROMPT INJECTION (LOW): (Category 8) This skill creates a permanent vulnerability surface by design.
- Ingestion points: Untrusted data from session interactions (e.g., error logs, web content, or attacker-influenced debugging output).
- Boundary markers: Absent; there are no clear delimiters or escaping mechanisms to prevent data from being interpreted as instructions in the generated skill files.
- Capability inventory: The agent uses the
Writetool to create new markdown files in the skills directory, which are then parsed as instructions in subsequent sessions. - Sanitization: Absent; the skill does not appear to sanitize the 'discovered knowledge' before writing it to a persistent
.mdfile, allowing an attacker to inject permanent malicious instructions into the agent's long-term memory/skill library. - EXTERNAL_DOWNLOADS (LOW): The installation process involves cloning code from an untrusted third-party repository (
github.com/blader/Claudeception). Per [TRUST-SCOPE-RULE], this is a low-severity finding as it targets a non-standard source.
Audit Metadata