recovery-community-moderator
Pass
Audited by Gen Agent Trust Hub on Mar 6, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection attacks because it interpolates user-provided forum content directly into its logic without isolation.\n- Ingestion points: Untrusted data enters the skill via the
contentandcontextparameters inscripts/moderate_content.py, which are then passed to an LLM for evaluation.\n- Boundary markers: The skill lacks delimiters (such as XML tags or triple quotes) to separate the untrusted user content from the moderation instructions, making it possible for a user to override moderation rules via a malicious post.\n- Capability inventory: The skill possesses network capabilities to interact with the Anthropic API. It does not demonstrate capabilities for local file system modification, privilege escalation, or arbitrary command execution.\n- Sanitization: No sanitization or filtering logic is present to identify or neutralize prompt injection attempts within the content being moderated.
Audit Metadata