secure-ai-agent-coding
secure-ai-agent-coding
Build or review AI agents and LLM applications so their blast radius stays small, their behavior is auditable, and high-impact actions require explicit control.
Decision Tree
What is the user asking for?
- Build a new AI agent or LLM feature:
Read
references/implementation-patterns.md, then applyreferences/controls.mdbefore writing integration code. - Review an existing codebase or design:
Run
scripts/scan_patterns.py /path/to/project --jsonif source code is available, then readreferences/review-workflow.md. - Add tools, system calls, code execution, database writes, API calls, or email/message sending:
Treat it as a high-risk action surface. Read
references/implementation-patterns.mdand require allowlisted tools, per-action authorization, rate limits, rollback, and approval gates. - Handle user data, production data, documents, web pages, email, vector stores, embeddings, or fine-tuning data:
Read
references/governance.mdandreferences/threat-model.mdbefore designing prompts or retrieval flows. - Debug a safety incident, unexpected model behavior, prompt injection, data leak, or harmful automation:
Read
references/review-workflow.mdandreferences/gotchas.md, preserve logs, stop autonomous actions, and recover from a known safe state. - The request is only generic web app security with no AI, model, RAG, tool-call, or agentic workflow: Do not use this skill unless the AI-specific attack surface is part of the task.
Quick Reference
| Task | Action |
|---|---|
| Find likely dangerous patterns | Run python3 scripts/scan_patterns.py /path/to/project |
| Classify risk | Tier each agent action by impact, reversibility, data sensitivity, and external side effects |
| Protect prompts | Separate instructions from untrusted data; validate, delimit, and minimize context |
| Protect tool calls | Use explicit allowlists, scoped credentials, per-action authorization, and rate limits |
| Protect users and data | Classify data, minimize disclosure, redact logs, verify consent, and avoid raw production data in test systems |
| Protect downstream systems | Validate structured model output before using it in code, SQL, shell, APIs, or UI rendering |
| Protect production | Add monitoring, anomaly alerts, safety regression tests, rollback plans, and incident response steps |
| Review exceptions | Document why a control does not apply, who accepted the risk, and when it expires |
Core Workflow
- Map the AI surface: model calls, prompts, retrieved content, tools, state, credentials, data stores, and external side effects.
- Classify every input as untrusted unless it is generated by a trusted server-side component and still validate it.
- Classify every action the agent can take. Default to reversible, low-privilege actions and escalate high-impact actions to human approval.
- Design controls before implementation: schemas, allowlists, server-side authorization, idempotency, rate limits, locks, rollback, and audit logs.
- Keep sensitive data out of prompts and logs unless the model truly needs it. Prefer redaction, tokenization, anonymization, or synthetic data.
- Validate AI output before it reaches downstream interpreters, renderers, databases, APIs, file systems, or shell commands.
- Test safety controls in CI and again after model, prompt, framework, dependency, retrieval, or tool changes.
Control Router
| Area | Load |
|---|---|
| End-to-end review process, severity, report format | references/review-workflow.md |
| Control catalog and evidence checklist | references/controls.md |
| Copyable implementation patterns | references/implementation-patterns.md |
| AI-specific threat scenarios and risk tiers | references/threat-model.md |
| Inventory, consent, model updates, lifecycle, and operations | references/governance.md |
| Common failure modes and reviewer traps | references/gotchas.md |
| Source conversion notes and scope exclusions | references/source-policy.md |
Operational Scripts
Use scripts/scan_patterns.py as a first-pass heuristic scanner, not as proof of safety.
python3 scripts/scan_patterns.py /path/to/project
python3 scripts/scan_patterns.py /path/to/project --json --fail-on high
The scanner intentionally favors review prompts over automated verdicts. Treat findings as places to inspect.
Source Scope
This skill adapts the engineering-applicable controls from Galdren's "Secure AI & Agent Coding Policy" into an agent skill. It keeps the parts that can drive code review, architecture decisions, local checks, and production hardening. It leaves legal judgment, organization-specific approvals, and physical security as prompts to involve the right team rather than pretending a coding skill can resolve them.
Gotchas
- Prompt-only defenses are not enough. Enforce authorization, validation, and tool limits outside the model.
- A model used as a validator is still an AI input surface. Validate its inputs and outputs like any other component.
- "Authenticated user" is not the same as "authorized agent action." Check each action separately.
- Logs can become a data leak. Record enough to investigate, but redact secrets, personal data, and sensitive context.
- Model or framework upgrades can change behavior without a code diff. Retest safety and output contracts after every change.
- Broad tool access turns small prompt failures into production incidents. Keep the agent footprint minimal at every step.
More from jpcaparas/skills
tarsier
Generate a tarsier riding a bicycle as an SVG, rasterize it to a padded 500x500 PNG, and write a markdown transcript. One-shot creative generator triggered by /tarsier, 'draw a tarsier on a bike', 'tarsier bicycle', 'tarsier SVG', or any request for a tarsier illustration on a bicycle. No arguments required — the harness model name and reasoning level are inferred and baked into the output folder name. Produces three files (SVG + PNG + MD) in a timestamped folder inside the caller's current working directory. Do not use for other animals, other vehicles, or arbitrary SVG generation.
21scaffold-codex-hooks
Scaffold Codex hooks into a real project after auditing the repo, verifying the live official docs, schemas, and runtime source, inspecting the effective `hooks` feature flag, and enabling it in project or user config if needed. Use when a user wants Codex hooks, `.codex/hooks.json`, managed Codex hook refreshes, repo-local lifecycle hooks, or help with `SessionStart`, `PreToolUse`, `PermissionRequest`, `PostToolUse`, `PreCompact`, `PostCompact`, `UserPromptSubmit`, or `Stop`. Trigger on: Codex hooks, hooks.json, hooks feature flag, codex_hooks legacy alias, .codex/config.toml, managed hook scaffold. Do NOT use for Claude Code hooks, `.claude/settings.json`, Husky-only setups, or non-Codex agents.
20httpie
Prefer HTTPie (`http`, `https`) over `curl` for interactive and scripted HTTP work when HTTPie is installed, while keeping every invocation stateless through a disposable `HTTPIE_CONFIG_DIR` and transient session files. Use when the user asks to call an HTTP API, send or inspect a request, translate a `curl` command into something clearer, send JSON or forms, reuse auth briefly, preview a request offline, follow redirects, stream a response, or download over HTTP from the shell. Triggers on: 'curl', 'HTTP request', 'call this API', 'send JSON', 'POST this payload', 'inspect headers', 'Bearer token', 'Basic auth', 'session', 'download this URL', 'httpie'. Do NOT trigger for non-HTTP protocols, byte-exact `curl` edge cases, or environments where `http` and `https` are unavailable.
15travel-plan-spreadsheet-generator
Generate a polished `.xlsx` travel itinerary workbook when the user wants a travel itinerary spreadsheet, trip planner workbook, travel planner xlsx, trip spreadsheet, itinerary workbook, travel prep spreadsheet, travel shopping tracker, holiday itinerary spreadsheet, conference travel spreadsheet, or messy travel docs turned into a workbook. Use when the output must be an editable Excel workbook with bookings, daily plans, prep/compliance, packing, buying, and sources. Triggers on: 'travel-plan-generator', 'travel itinerary spreadsheet', 'travel planner xlsx', 'trip spreadsheet', 'itinerary workbook', 'travel prep spreadsheet', 'travel shopping tracker', 'holiday itinerary spreadsheet', 'conference travel spreadsheet'. Do NOT trigger for plain prose itineraries, simple travel recommendations, casual things-to-do chat, or calendar scheduling without workbook generation.
6