The Agent Skills Directory

[PROMPT_INJECTION]: The skill manages an indirect prompt injection surface when processing Reddit content.
Ingestion points: Untrusted content is fetched from Reddit modqueue, reports, and author history via the reddit_mod.py script.
Boundary markers: The classification prompt utilizes <untrusted-content> tags to isolate raw user data from instructions.
Capability inventory: The skill enables the agent to execute moderation actions such as approve, remove, and lock on Reddit items.
Sanitization: Explicit instructions are provided to the model to evaluate potential injection attempts (e.g., 'ignore previous instructions') as malicious signals for classification rather than as valid commands.
[COMMAND_EXECUTION]: The skill utilizes a local script reddit_mod.py for Reddit API interactions and managing subreddit-specific data directories.
[SAFE]: Credentials are managed via environment variables and local .env files with appropriate file permissions (chmod 600) recommended.

reddit-moderate