reflect

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: CRITICALPROMPT_INJECTIONCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [PROMPT_INJECTION / Indirect Prompt Injection] (CRITICAL): The skill is purpose-built to extract instructions from session transcripts and persist them as permanent rules in SKILL.md files. This is a critical vulnerability because an agent interacting with untrusted data (e.g., summarizing a poisoned website) could encounter 'fake' user corrections or approvals injected by an attacker.
  • Ingestion points: Entire session transcripts (README.md, '7-Phase Pipeline'), which naturally contain external data processed by the agent.
  • Boundary markers: None detected. The skill appears to lack mechanisms to distinguish between a genuine human user's correction and text that was part of a processed document or simulated dialogue within the transcript.
  • Capability inventory: Modifies local SKILL.md files and executes Git operations (README.md).
  • Sanitization: No evidence of sanitization or origin verification for the 'learnings' extracted from the text.
  • [REMOTE_CODE_EXECUTION / Logic Modification] (CRITICAL): While not executing traditional binary code, modifying SKILL.md files is functionally equivalent to code modification for AI agents. This allows an attacker to inject permanent 'Always' or 'Never' rules that can hijack future sessions, exfiltrate data, or bypass safety filters (Persistence Category 6).
  • [COMMAND_EXECUTION] (HIGH): The README explicitly mentions a 'Git Commit' phase in its pipeline. Automated git operations on skill files can be abused to maintain persistence of malicious instructions or potentially influence remote repositories if 'git push' is eventually integrated.
  • [Privilege Escalation] (HIGH): The skill allows the agent to acquire new 'rules' that may override original safety boundaries or operational constraints defined by developers. The 'Safety Rules' in the README (e.g., protecting eval-harness) are self-proclaimed and only protect a narrow scope, leaving all other skills vulnerable to modification.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
CRITICAL
Analyzed
Feb 16, 2026, 09:33 AM