secure-ai-agent-coding

Installation
SKILL.md

secure-ai-agent-coding

Build or review AI agents and LLM applications so their blast radius stays small, their behavior is auditable, and high-impact actions require explicit control.

Decision Tree

What is the user asking for?

  • Build a new AI agent or LLM feature: Read references/implementation-patterns.md, then apply references/controls.md before writing integration code.
  • Review an existing codebase or design: Run scripts/scan_patterns.py /path/to/project --json if source code is available, then read references/review-workflow.md.
  • Add tools, system calls, code execution, database writes, API calls, or email/message sending: Treat it as a high-risk action surface. Read references/implementation-patterns.md and require allowlisted tools, per-action authorization, rate limits, rollback, and approval gates.
  • Handle user data, production data, documents, web pages, email, vector stores, embeddings, or fine-tuning data: Read references/governance.md and references/threat-model.md before designing prompts or retrieval flows.
  • Debug a safety incident, unexpected model behavior, prompt injection, data leak, or harmful automation: Read references/review-workflow.md and references/gotchas.md, preserve logs, stop autonomous actions, and recover from a known safe state.
  • The request is only generic web app security with no AI, model, RAG, tool-call, or agentic workflow: Do not use this skill unless the AI-specific attack surface is part of the task.

Quick Reference

Task Action
Find likely dangerous patterns Run python3 scripts/scan_patterns.py /path/to/project
Classify risk Tier each agent action by impact, reversibility, data sensitivity, and external side effects
Protect prompts Separate instructions from untrusted data; validate, delimit, and minimize context
Protect tool calls Use explicit allowlists, scoped credentials, per-action authorization, and rate limits
Protect users and data Classify data, minimize disclosure, redact logs, verify consent, and avoid raw production data in test systems
Protect downstream systems Validate structured model output before using it in code, SQL, shell, APIs, or UI rendering
Protect production Add monitoring, anomaly alerts, safety regression tests, rollback plans, and incident response steps
Review exceptions Document why a control does not apply, who accepted the risk, and when it expires

Core Workflow

  1. Map the AI surface: model calls, prompts, retrieved content, tools, state, credentials, data stores, and external side effects.
  2. Classify every input as untrusted unless it is generated by a trusted server-side component and still validate it.
  3. Classify every action the agent can take. Default to reversible, low-privilege actions and escalate high-impact actions to human approval.
  4. Design controls before implementation: schemas, allowlists, server-side authorization, idempotency, rate limits, locks, rollback, and audit logs.
  5. Keep sensitive data out of prompts and logs unless the model truly needs it. Prefer redaction, tokenization, anonymization, or synthetic data.
  6. Validate AI output before it reaches downstream interpreters, renderers, databases, APIs, file systems, or shell commands.
  7. Test safety controls in CI and again after model, prompt, framework, dependency, retrieval, or tool changes.

Control Router

Area Load
End-to-end review process, severity, report format references/review-workflow.md
Control catalog and evidence checklist references/controls.md
Copyable implementation patterns references/implementation-patterns.md
AI-specific threat scenarios and risk tiers references/threat-model.md
Inventory, consent, model updates, lifecycle, and operations references/governance.md
Common failure modes and reviewer traps references/gotchas.md
Source conversion notes and scope exclusions references/source-policy.md

Operational Scripts

Use scripts/scan_patterns.py as a first-pass heuristic scanner, not as proof of safety.

python3 scripts/scan_patterns.py /path/to/project
python3 scripts/scan_patterns.py /path/to/project --json --fail-on high

The scanner intentionally favors review prompts over automated verdicts. Treat findings as places to inspect.

Source Scope

This skill adapts the engineering-applicable controls from Galdren's "Secure AI & Agent Coding Policy" into an agent skill. It keeps the parts that can drive code review, architecture decisions, local checks, and production hardening. It leaves legal judgment, organization-specific approvals, and physical security as prompts to involve the right team rather than pretending a coding skill can resolve them.

Gotchas

  1. Prompt-only defenses are not enough. Enforce authorization, validation, and tool limits outside the model.
  2. A model used as a validator is still an AI input surface. Validate its inputs and outputs like any other component.
  3. "Authenticated user" is not the same as "authorized agent action." Check each action separately.
  4. Logs can become a data leak. Record enough to investigate, but redact secrets, personal data, and sensitive context.
  5. Model or framework upgrades can change behavior without a code diff. Retest safety and output contracts after every change.
  6. Broad tool access turns small prompt failures into production incidents. Keep the agent footprint minimal at every step.
Related skills

More from jpcaparas/skills

Installs
3
GitHub Stars
13
First Seen
6 days ago