reddit-moderate
Pass
Audited by Gen Agent Trust Hub on Apr 1, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill manages an indirect prompt injection surface when processing Reddit content.
- Ingestion points: Untrusted content is fetched from Reddit modqueue, reports, and author history via the
reddit_mod.pyscript. - Boundary markers: The classification prompt utilizes
<untrusted-content>tags to isolate raw user data from instructions. - Capability inventory: The skill enables the agent to execute moderation actions such as approve, remove, and lock on Reddit items.
- Sanitization: Explicit instructions are provided to the model to evaluate potential injection attempts (e.g., 'ignore previous instructions') as malicious signals for classification rather than as valid commands.
- [COMMAND_EXECUTION]: The skill utilizes a local script
reddit_mod.pyfor Reddit API interactions and managing subreddit-specific data directories. - [SAFE]: Credentials are managed via environment variables and local
.envfiles with appropriate file permissions (chmod 600) recommended.
Audit Metadata