NYC

recovery-community-moderator

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • Prompt Injection (LOW): The skill is vulnerable to indirect prompt injection. The moderation script interpolates raw user content directly into the system prompt without boundary markers, allowing a malicious user to potentially override moderation logic.
  • Ingestion points: scripts/moderate_content.py (line 74) via the content parameter.
  • Boundary markers: Absent. No delimiters are used to isolate the content variable from the instructions.
  • Capability inventory: The skill is granted 'Read', 'Write', and 'Edit' tools in SKILL.md, and the Python script has network access via the Anthropic API.
  • Sanitization: Absent. No filtering or escaping is performed on the user content before interpolation.
  • External Downloads (LOW): The script depends on the external anthropic library. While this is a standard library for model access, it is an external dependency that must be installed in the environment.
  • Evidence: scripts/moderate_content.py (lines 16-19) contains a check for the package and instructions to install it via pip.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:05 PM