agent-observability

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION] (HIGH): The skill implements a persistent data poisoning surface by logging untrusted user 'corrections' directly into a repository file (docs/observed-coding-agent-issues.md) intended to establish behavioral guardrails. This file modification creates a feedback loop where an adversary can inject malicious logic that persists in the environment. \n- Ingestion points: User input triggered by correction phrases (e.g., 'don't do that', 'always do Y', 'never do Z') in SKILL.md. \n- Boundary markers: Absent. User-supplied content is summarized and appended to the markdown log without delimiters or explicit 'ignore' instructions for the agent when reading the log. \n- Capability inventory: Persistent file write access to docs/observed-coding-agent-issues.md and broad file system read access to various skill definition files. \n- Sanitization: Absent. There is no evidence of validation, escaping, or filtering of the user-provided correction content before it is written to the log as a 'guardrail'.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 05:58 AM