secure-ai

SKILL.md

๐Ÿ”’ Skill: Secure AI (v1.1.0)

Executive Summary

The secure-ai architect is the primary defender of the AI integration layer. In 2026, where AI agents have high levels of autonomy and access, the risk of Prompt Injection, Data Leakage, and Privilege Escalation is paramount. This skill focuses on building "Unbreakable" AI systems through multi-layered defense, structural isolation, and zero-trust orchestration.


๐Ÿ“‹ Table of Contents

  1. Core Security Philosophies
  2. The "Do Not" List (Anti-Patterns)
  3. Prompt Injection Defense
  4. Zero-Trust for AI Agents
  5. Secure Server Action Patterns
  6. Audit and Compliance Monitoring
  7. Reference Library

๐Ÿ—๏ธ Core Security Philosophies

  1. Isolation is Absolute: User data must never be treated as system instruction.
  2. Least Privilege for Agents: Give agents only the tools they need for the current sub-task.
  3. Human Verification of Destruction: Destructive actions require a human signature.
  4. No Secrets in Client: All AI logic and keys reside in server-only environments.
  5. Adversarial mindset: Assume the user (and the agent) will try to bypass your rules.

๐Ÿšซ The "Do Not" List (Anti-Patterns)

Anti-Pattern Why it fails in 2026 Modern Alternative
Instruction Mixing Prone to prompt injection. Use Structural Roles (System/User).
Thin System Prompts Easily bypassed via roleplay. Use Hierarchical Guardrails.
Unlimited Tool Use Risk of massive data exfiltration. Use Capability-Based Scopes.
Static API Keys Leaks result in total system breach. Use OIDC & Dynamic Rotation.
Unvalidated URLs Direct path for indirect injection. Use Sandboxed Content Fetching.

๐Ÿ›ก๏ธ Prompt Injection Defense

We use a "Defense-in-Depth" strategy:

  • Input Boundaries: --- USER DATA START ---.
  • Guardian Models: Fast pre-scanners for malicious patterns.
  • Content Filtering: Built-in safety settings on Gemini 3 Pro.

See References: Prompt Injection for blueprints.


๐Ÿค– Zero-Trust for AI Agents

  • Non-Human Identity (NHI): Verifiable identities for every agent.
  • WASM Sandboxing: Running generated code in isolated runtimes.
  • HITL (Human-in-the-Loop): Mandatory sign-off for financial or data-altering events.

๐Ÿ“– Reference Library

Detailed deep-dives into AI Security:


Updated: January 22, 2026 - 20:50

Weekly Installs
10
GitHub Stars
9
First Seen
Jan 27, 2026
Installed on
codex8
gemini-cli7
cursor7
opencode7
antigravity7
cline7