auditing-experiments-flags

Pass

Audited by Gen Agent Trust Hub on Mar 23, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill ingests untrusted metadata from PostHog entities, which could contain malicious instructions designed to hijack the agent's logic during the audit process.
  • Ingestion points: Untrusted data enters the context through read_data and list_data calls targeting experiment and feature flag objects in the PostHog environment.
  • Boundary markers: The instructions lack delimiters or specific system-level warnings to the agent to disregard instructions found within the description or metrics fields of the data being processed.
  • Capability inventory: The skill utilizes create_notebook to generate report artifacts. While it does not have direct access to shell execution or network tools, the ability to generate structured output from untrusted input is a known vector for indirect injection.
  • Sanitization: No sanitization or validation of the fetched data fields is performed before the content is interpreted by the agent for check evaluation.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 23, 2026, 07:56 PM