prompt-injection-defense
SKILL.md
Prompt Injection Defense
Mitigate direct and indirect prompt injection across chat apps, agentic workflows, and RAG pipelines.
Attack Surface
- User input attempting to override system instructions
- Untrusted documents/web pages in retrieval context
- Tool output that smuggles malicious instructions
- Cross-tenant leakage via shared context windows
Defense-in-Depth Pattern
- Instruction hierarchy enforcement: system > developer > user > tool output.
- Context segregation: isolate untrusted text from control instructions.
- Tool permissioning: explicit allow-list per task and tenant.
- Output policy checks: validate schema, redact secrets, block unsafe actions.
- Human approval: required for high-impact operations.
Implementation Controls
- Strip or label untrusted content blocks before generation.
- Disable autonomous tool chaining for sensitive workflows.
- Use deterministic parsers (JSON schema) before tool execution.
- Reject requests containing high-risk exfiltration patterns.
- Add canary tokens to detect data exfil attempts.
Red-Team Test Cases
- "Ignore previous instructions" style direct override
- Retrieval payload containing hidden policy bypass text
- Tool output instructing follow-up privileged command
- Prompt that asks for secrets from memory or env vars
Security Metrics
- Prompt injection detection rate
- Unsafe tool invocation prevention rate
- Time-to-containment for injection attempts
- False positive rate on blocked safe prompts
Related Skills
- ai-agent-security - Agent threat model and controls
- llm-app-security - End-to-end LLM app hardening
- security-automation - Automated policy response workflows
Weekly Installs
2
Repository
bagelhole/devop…t-skillsGitHub Stars
13
First Seen
7 days ago
Security Audits
Installed on
opencode2
antigravity2
claude-code2
github-copilot2
codex2
zencoder2