auditing-experiments-flags
Pass
Audited by Gen Agent Trust Hub on Mar 23, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill ingests untrusted metadata from PostHog entities, which could contain malicious instructions designed to hijack the agent's logic during the audit process.
- Ingestion points: Untrusted data enters the context through
read_dataandlist_datacalls targeting experiment and feature flag objects in the PostHog environment. - Boundary markers: The instructions lack delimiters or specific system-level warnings to the agent to disregard instructions found within the
descriptionormetricsfields of the data being processed. - Capability inventory: The skill utilizes
create_notebookto generate report artifacts. While it does not have direct access to shell execution or network tools, the ability to generate structured output from untrusted input is a known vector for indirect injection. - Sanitization: No sanitization or validation of the fetched data fields is performed before the content is interpreted by the agent for check evaluation.
Audit Metadata