recovery-community-moderator
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- Prompt Injection (LOW): The skill is vulnerable to indirect prompt injection. The moderation script interpolates raw user content directly into the system prompt without boundary markers, allowing a malicious user to potentially override moderation logic.
- Ingestion points: scripts/moderate_content.py (line 74) via the
contentparameter. - Boundary markers: Absent. No delimiters are used to isolate the
contentvariable from the instructions. - Capability inventory: The skill is granted 'Read', 'Write', and 'Edit' tools in SKILL.md, and the Python script has network access via the Anthropic API.
- Sanitization: Absent. No filtering or escaping is performed on the user content before interpolation.
- External Downloads (LOW): The script depends on the external
anthropiclibrary. While this is a standard library for model access, it is an external dependency that must be installed in the environment. - Evidence: scripts/moderate_content.py (lines 16-19) contains a check for the package and instructions to install it via pip.
Audit Metadata