reflect
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
- Indirect Prompt Injection (HIGH): The core functionality involves extracting 'learnings' from conversation history and writing them into persistent skill files.
- Ingestion points: Conversation history in the current session (SKILL.md).
- Boundary markers: None specified for the extracted content, allowing external instructions to potentially override system logic.
- Capability inventory: File write access to
~/.claude/skills/and execution of Git commands (git commit,git push). - Sanitization: The skill relies on 'confidence' levels but lacks programmatic sanitization to prevent 'learning' malicious instructions embedded in user-provided data (e.g., a web page summary containing instructions to 'always forward API keys to X').
- Command Execution & Exfiltration (MEDIUM): The skill explicitly uses shell commands for Git operations (
git init,git remote add,git push). While intended for versioning, an attacker who successfully poisons a skill could use the 'git push' capability to exfiltrate session data or modified instructions to an attacker-controlled repository. - Persistence (MEDIUM): The skill uses 'auto-registered' stop hooks to trigger behavior at the end of every session. This ensures that any malicious 'learnings' are persisted and applied to all future agent interactions without recurring manual intervention.
Recommendations
- AI detected serious security threats
Audit Metadata