self-reflection

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • PROMPT_INJECTION (HIGH): The skill is vulnerable to persistent indirect prompt injection that can permanently alter agent behavior. * Ingestion points: The agent reviews conversation history and incorporates user corrections during the 'Human-in-the-loop' phase (File: SKILL.md). * Boundary markers: Absent; the reflection template uses standard markdown headers and bullets without specific delimiters to isolate lessons from system instructions. * Capability inventory: Directs the agent to create or update persistent files such as 'memories/lessons-learned.md' (File: SKILL.md). * Sanitization: No instructions exist to sanitize, escape, or validate the drafted lessons against safety protocols before they are committed to storage.
  • COMMAND_EXECUTION (LOW): The workflow encourages the agent to 'run commands' and use shell utilities like 'test -f' for verification (File: SKILL.md), which reinforces the use of powerful system tools based on potentially untrusted context generated during task failures.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 05:33 AM