recovery-community-moderator

Pass

Audited by Gen Agent Trust Hub on Mar 5, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection through the moderation of forum content.
  • Ingestion points: Untrusted content is ingested via the moderate_content function in scripts/moderate_content.py, where user-generated text is directly interpolated into the prompt string.
  • Boundary markers: There are no explicit boundary markers (e.g., XML tags, triple-quotes, or specific delimiters) or instructions to 'ignore embedded commands' surrounding the {content} variable in the prompt.
  • Capability inventory: The skill metadata in SKILL.md allows the tools Read, Write, and Edit. An attacker could embed instructions in a forum post that, when processed, trick the moderator into outputting a suggested_action that results in unauthorized file operations or logic bypass.
  • Sanitization: No sanitization, escaping, or validation is performed on the input content before it is sent to the LLM.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 5, 2026, 02:34 PM