immune

Pass

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is vulnerable to Indirect Prompt Injection (Category 8) due to its automated memory update mechanism.
  • Ingestion points: Untrusted content is ingested via the input parameter in SKILL.md (Step 2) and interpolated into an XML-structured prompt for the immune-scan agent.
  • Boundary markers: The skill uses XML tags like <content> and <scan_request> to separate input data from instructions, which is a good practice but insufficient against sophisticated adversarial inputs.
  • Capability inventory: The skill possesses the capability to modify local JSON files (immune_memory.json and cheatsheet_memory.json) and spawn a sub-agent (immune-scan) using the Haiku model. The orchestrator automatically adds detected threats and strategies to these files if auto_add_threats or auto_add_strategies is enabled in config.yaml.
  • Sanitization: There is no evidence of sanitization, escaping, or validation of the input content before it is passed to the sub-agent or processed for memory updates.
  • [PROMPT_INJECTION]: A 'poisoning' attack is possible where a user provides content that tricks the immune-scan sub-agent into identifying a malicious instruction as a 'new strategy' or 'antibody'. Because the system is designed to 'learn winning strategies over time', these malicious patterns could be persisted in the cheatsheet_memory.json and automatically injected into future prompt generations as positive patterns, leading to persistent agent compromise.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 6, 2026, 04:38 PM