ai-red-teaming
SKILL.md
AI Red Teaming
Continuously test AI applications like an adversary to discover exploitable failure modes before attackers do.
Program Design
- Define threat scenarios: jailbreaks, policy evasion, prompt injection, model abuse.
- Build reusable attack suites by domain (support bot, coding agent, RAG assistant).
- Include multilingual and obfuscated attack prompts.
- Track results in a risk register with severity and exploitability.
Test Categories
- Jailbreak robustness: bypassing safety instructions.
- Data exfiltration: extracting secrets, system prompts, tenant data.
- Tool abuse: unauthorized API calls or command execution.
- Social engineering: inducing unsafe business actions.
- Availability abuse: token amplification and DoS-style prompts.
Exercise Cadence
- Pre-release blocking red-team gate.
- Monthly deep-dive campaigns.
- Post-incident targeted retests.
Scoring Model
- Likelihood (1-5)
- Impact (1-5)
- Detectability (1-5)
- Control maturity (low/medium/high)
Use scores to prioritize fixes and define SLA for remediation.
Reporting Essentials
- Reproducible prompt traces
- Model/version and config used
- Successful attack chain narrative
- Recommended mitigations + verification steps
Related Skills
- agent-evals - Convert findings into regression tests
- prompt-injection-defense - Implement injection countermeasures
- penetration-testing - Broader offensive security process
Weekly Installs
3
Repository
bagelhole/devop…t-skillsGitHub Stars
13
First Seen
7 days ago
Security Audits
Installed on
opencode3
antigravity3
claude-code3
github-copilot3
codex3
zencoder3