claudeception

Warn

Audited by Gen Agent Trust Hub on Feb 26, 2026

Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The activator script (claudeception-activator.sh) is designed to be injected into the agent's input stream after every prompt. It uses authoritative markers like 'CRITICAL', 'MANDATORY', and 'NON-NEGOTIABLE' to override the agent's standard behavior and force it to execute the skill-extraction workflow regardless of context.
  • [COMMAND_EXECUTION]: The skill's primary function is to programmatically generate and save new skill files (SKILL.md) to the user's home directory (~/.claude/skills/). Because these files are automatically loaded as system instructions in subsequent sessions, this represents a form of dynamic execution where the agent modifies its own future operating logic at runtime.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it analyzes session history and external research for 'valuable knowledge' without sanitization.
  • Ingestion points: Conversation logs, trial-and-error output, and web search results are processed by the skill in SKILL.md (Retrospective Mode).
  • Boundary markers: No explicit delimiters or instruction-blocking warnings are used when processing the text for extraction.
  • Capability inventory: The skill possesses the Write and Edit tools, allowing it to commit extracted (and potentially malicious) instructions to the persistent skill library.
  • Sanitization: There is no evidence of filtering or validation logic to ensure that instructions extracted from logs or external research are safe to be executed in future sessions.
  • [COMMAND_EXECUTION]: The installation guide requires the user to configure a UserPromptSubmit hook that executes a local shell script (claudeception-activator.sh) for every interaction. This creates a persistent execution hook into the agent's core workflow.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 26, 2026, 04:04 AM