skill-improvement-from-observability

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [Indirect Prompt Injection] (HIGH): The skill creates a direct path for untrusted data to influence the agent's logic and code.
  • Ingestion points: Telemetry data from Loki, specifically event_type="user_prompt" and event_type="tool_result" in SKILL.md.
  • Boundary markers: Absent. There are no mechanisms described to distinguish between legitimate log data and malicious instructions embedded within those logs.
  • Capability inventory: The skill leverages skill-updater and observability-pattern-detector to perform file writes and modify the logic of other skills.
  • Sanitization: Absent. The workflow transitions directly from log queries to code 'recommendations' and 'implementation' steps.
  • [Remote Code Execution / Dynamic Execution] (HIGH): The skill's primary function is to update code at runtime based on patterns it detects.
  • Evidence: Integration with skill-updater and the 'Auto-Apply Safe' classification for adding error handling or validation (SKILL.md).
  • Risk: Automated code modification ('self-improvement') without mandatory human review is a high-risk pattern that can be exploited to achieve persistence or execute arbitrary commands by poisoning the telemetry source.
  • [Data Exposure] (MEDIUM): The skill performs broad analytics on centralized logs that likely contain sensitive information.
  • Evidence: LogQL queries in SKILL.md target user_prompt and tool_result across a 7-day window.
  • Risk: While not explicitly exfiltrating data to an external domain, the automated processing and 'pattern extraction' from raw user input increases the risk of sensitive data being exposed to other tools or logged in less secure formats.
  • [Metadata Poisoning] (MEDIUM): The skill includes a 'Safety Classification' section that claims certain code-modifying actions are 'Auto-Apply Safe'.
  • Risk: These claims are self-referential and may lead users to disable necessary human-in-the-loop safeguards for operations that can actually be manipulated to inject malicious logic.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 04:30 AM