ai-moderating-content
Installation
SKILL.md
Auto-Moderate What Users Post
Guide the user through building AI content moderation — classify user-generated content, score severity, and route decisions (auto-approve, human-review, auto-reject). The pattern: classify, score, route.
When NOT to use AI moderation
- Low-volume content — if a human can review everything in under an hour per day, skip AI. The complexity of maintaining a moderation pipeline is not worth it.
- Exact-match violations only — if your policy is just a blocklist of words or regex patterns (SSNs, emails, phone numbers), use pattern matching directly. No LM needed.
- Legal-grade decisions — AI moderation is a first pass, not a legal ruling. If a wrong moderation decision has legal consequences (DMCA takedowns, defamation claims), always route to human review.
Consider /ai-sorting instead if you just need classification without severity scoring or routing logic.
Step 1: Define your moderation policy
Ask the user:
- What content do you need to catch? (hate speech, spam, NSFW, harassment, self-harm, illegal activity, PII)
- What are the severity levels? (warning, remove, ban)
- What is the tolerance for false positives? (over-moderating frustrates users)
- Is human review in the loop? (auto-only vs. auto + human escalation)