immune
Pass
Audited by Gen Agent Trust Hub on Mar 6, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is vulnerable to Indirect Prompt Injection (Category 8) due to its automated memory update mechanism.
- Ingestion points: Untrusted content is ingested via the
inputparameter inSKILL.md(Step 2) and interpolated into an XML-structured prompt for theimmune-scanagent. - Boundary markers: The skill uses XML tags like
<content>and<scan_request>to separate input data from instructions, which is a good practice but insufficient against sophisticated adversarial inputs. - Capability inventory: The skill possesses the capability to modify local JSON files (
immune_memory.jsonandcheatsheet_memory.json) and spawn a sub-agent (immune-scan) using the Haiku model. The orchestrator automatically adds detected threats and strategies to these files ifauto_add_threatsorauto_add_strategiesis enabled inconfig.yaml. - Sanitization: There is no evidence of sanitization, escaping, or validation of the input content before it is passed to the sub-agent or processed for memory updates.
- [PROMPT_INJECTION]: A 'poisoning' attack is possible where a user provides content that tricks the
immune-scansub-agent into identifying a malicious instruction as a 'new strategy' or 'antibody'. Because the system is designed to 'learn winning strategies over time', these malicious patterns could be persisted in thecheatsheet_memory.jsonand automatically injected into future prompt generations as positive patterns, leading to persistent agent compromise.
Audit Metadata