agent-observability
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION] (HIGH): The skill implements a persistent data poisoning surface by logging untrusted user 'corrections' directly into a repository file (
docs/observed-coding-agent-issues.md) intended to establish behavioral guardrails. This file modification creates a feedback loop where an adversary can inject malicious logic that persists in the environment. \n- Ingestion points: User input triggered by correction phrases (e.g., 'don't do that', 'always do Y', 'never do Z') inSKILL.md. \n- Boundary markers: Absent. User-supplied content is summarized and appended to the markdown log without delimiters or explicit 'ignore' instructions for the agent when reading the log. \n- Capability inventory: Persistent file write access todocs/observed-coding-agent-issues.mdand broad file system read access to various skill definition files. \n- Sanitization: Absent. There is no evidence of validation, escaping, or filtering of the user-provided correction content before it is written to the log as a 'guardrail'.
Recommendations
- AI detected serious security threats
Audit Metadata