devops
DevOps Guardrails
Default to planning and risk control. Execute only with explicit user approval.
Source of Truth
Use official docs and standards as primary references:
- Docker Docs: Dockerfile best practices, Compose spec, secrets, rootless mode.
- Nginx Docs: command-line switches,
ssl_protocols,add_header,limit_req. - PM2 Docs: cluster mode, zero-downtime reload, ecosystem config, startup persistence.
- PostgreSQL Docs (current):
pg_basebackup, continuous archiving (PITR),pg_hba.conf, role attributes,GRANT, predefined roles. - OpenSSH manuals:
sshd,sshd_config, authorized_keys restriction options. - IETF standards: RFC 8446 (TLS 1.3), RFC 6797 (HSTS).
Core Contract
1) Stay in Plan-Only Mode by Default
- Analyze configs, logs, and architecture from provided files.
- Propose exact command lists before any remote command.
- Never start remote diagnostics, deployment, restart, migration, or data operation without approval.
2) Require Explicit Approval for Every Remote Operation
Accept execution only when the user provides this structure:
APPROVE env=<dev|staging|prod> target=<host/group> ticket=<id> ttl=<15m|30m|60m>
CHANGE: <what is changing>
COMMANDS:
1) <exact command 1>
2) <exact command 2>
ROLLBACK:
1) <exact rollback command 1>
Additional requirement for production:
CONFIRM_PROD: yes
If approval is missing, expired, ambiguous, or commands differ from approved list, stop and ask for corrected approval.
3) Enforce Least Privilege Access
- Use SSH key or SSH certificate only; never use password login.
- Use a restricted ops user (for example
devops-bot), never direct root login. - Prefer bastion or allowlisted source IP entry points.
- Require OpenSSH hardening baseline:
PubkeyAuthentication yesPasswordAuthentication noPermitRootLogin no(orforced-commands-onlyonly when explicitly justified)AllowUsersrestricted to ops users only.
- Require restrictive authorized key options for automation keys:
from=...command="/usr/local/bin/ops-gateway ..."no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty- or
restrict,command="..."form when supported.
4) Apply Hard Safety Rules
- Never print secrets, private keys, or full connection strings in output.
- Never run destructive commands unless explicitly approved and rollback exists.
- Never execute wildcard deletes on system paths.
- Never change firewall/networking blindly without a backout path.
High-risk commands requiring explicit high-risk acknowledgement in the same approval:
terraform apply,terraform destroykubectl apply,kubectl deletehelm upgrade,helm uninstalldocker system prune -aDROP DATABASE,DROP SCHEMA,TRUNCATE, broadDELETErm -rf,mkfs,dd, partition edits
Delivery Format for Every DevOps Task
Always provide: summary, risk/blast radius, staged commands (pre-check/change/verify/rollback), and approval block.
The final section of every completed task must be:
MANDATORY USER SECURITY ACTIONS
Rules for this section:
- Use strict language (
MUST,REQUIRED,DO NOT SKIP). - Give concrete owner-side actions, not optional suggestions.
- Include exact post-work closure items whenever temporary AI/ops access existed:
- rotate SSH keys and remove old keys from server
authorized_keys - rotate passwords and secrets (OS, DB, app/admin, API tokens)
- remove temporary automation/AI user accounts and sudo access
- invalidate temporary certificates/tokens/sessions
- review auth/audit logs for unexpected access
- rotate SSH keys and remove old keys from server
- If user has not confirmed closure, keep reminder active in subsequent responses.
Architecture Baseline
Use this baseline unless project constraints say otherwise:
Internet
-> Nginx (TLS termination, rate limiting, security headers)
-> Node.js app managed by PM2
-> PostgreSQL (private network, no public exposure)
-> Static assets / health endpoints
Docker Compose orchestrates app, nginx, and optional sidecars.
Backups and logs ship to off-host storage.
Core principles:
- Isolate app and database networks.
- Keep database private; expose only app/API via Nginx.
- Treat data and backups as first-class operations with tested restore.
- Prefer immutable deploy artifacts and explicit release versions.
Stack-Specific Rules
Golden Bootstrap / Server Template
For repeatable VPS setup tasks, require a golden bootstrap workflow:
- idempotent operations only (safe to re-run)
- one command entrypoint with profiles (
dev,prod,ci) - no embedded secrets (env files or secret manager only)
- mandatory temporary-access revoke stage
--dry-runsupport and audit logs- fail-fast checkpoints (
sudo, network, package manager, required binaries) - modular layout (
modules/*.shor Ansible roles)
Prefer:
Terraformfor infrastructure resourcesAnsiblefor server configuration and idempotent state- thin runner script for orchestration
When user asks for VPS template/bootstrap automation, load:
references/bootstrap-golden-setup.md
Standalone script location:
scripts/bootstrap/bootstrap.shscripts/bootstrap/install-to-project.sh
Execution policy for the script:
- prefer
--dry-runfirst - allow
--executeonly after explicit user approval in-chat - for production execute require explicit production confirmation
Bootstrap Script Installation Into Project
When user asks to add bootstrap scripts into a target project:
- never copy files silently
- show dry-run install plan first
- execute copy only after explicit install approval from user
Approval format for project script installation:
APPROVE_INSTALL target=<absolute-or-relative-project-path> ticket=<id>
approved_by=<name>
mode=<copy-missing|force-overwrite>
Installation command (reference):
bash infra/devops/scripts/bootstrap/install-to-project.sh \
--target <project-path> \
--dest ops/bootstrap \
--dry-run
Execute only after approval:
bash infra/devops/scripts/bootstrap/install-to-project.sh \
--target <project-path> \
--dest ops/bootstrap \
--execute \
--approval-id <id> \
--approved-by <name> \
--ticket <id>
If mode=force-overwrite, include --force.
Docker / Docker Compose
- Build minimal images, pin base image tags, run as non-root.
- Add healthchecks and resource limits.
- Use read-only root filesystem where practical.
- Inject secrets via environment/secret stores, never bake into image.
- Tag releases immutably (
app:<git-sha>), avoidlatestfor production.
Use detailed templates in:
references/docker-nginx-pm2-postgresql.md
Nginx
- Enforce TLS 1.2+ and modern ciphers.
- Prefer explicit
ssl_protocols TLSv1.2 TLSv1.3. - Set security headers (
HSTS,X-Content-Type-Options,X-Frame-Options, CSP where possible). - Add request size and timeout limits.
- Add rate limiting for auth and sensitive endpoints.
- Validate config with
nginx -tbefore reload. - Reload safely with
nginx -s reloadonly after successful config test.
PM2
- Use
ecosystem.config.jswith explicitinstances,exec_mode,max_memory_restart. - Use
pm2 reloadfor zero-downtime changes when possible. - Persist process list after successful deploy (
pm2 save). - Keep logs centralized and rotated.
PostgreSQL
- Separate admin and app roles; app role must be least privilege.
- Use SCRAM where possible; avoid MD5/password methods for remote access.
- Prefer
hostsslrecords and narrow CIDR ranges inpg_hba.conf. - Require migration strategy with pre-check and rollback options.
- Use scheduled base backups plus WAL archiving for point-in-time recovery.
- Test restore drills regularly and document RTO/RPO.
- Require explicit approval for schema-altering and data-destructive SQL.
Use operational checklists in:
references/checklists.md
Change Workflow
- Discovery: confirm env, targets, dependencies, maintenance window.
- Plan: build exact staged command list and risk level.
- Safety gate: command parity with approval, rollback readiness, backup status.
- Execute and verify (after approval only): run staged commands, stop on critical failure, apply rollback if needed, publish short report.
Required Clarifications Before Any Execute Request
Ask if missing:
- environment and target hosts
- maintenance window
- change ticket/reference
- rollback ownership
- data impact (
yes/no)
If any of these are unknown, keep the task in plan-only mode.
Mandatory Closure Protocol (After Work Is Done)
When change work is complete, enforce this closure flow in order:
- Confirm service health and rollback status.
- Require user to rotate operational credentials.
- Require removal of temporary access (keys/users/tokens).
- Require post-change audit review.
- Require explicit user confirmation that closure tasks were completed.
Do not mark work as fully closed until user confirms closure actions.
References
- Unified runbook and config templates:
references/docker-nginx-pm2-postgresql.md - Safety and incident checklists:
references/checklists.md - Golden bootstrap design and templates:
references/bootstrap-golden-setup.md
More from alicoder001/agent-skills
reasoning
Chain-of-thought reasoning, self-reflection, and systematic problem-solving patterns for AI agents. Use before any complex task to ensure logical and accurate solutions.
38typescript
TypeScript strict mode patterns, naming conventions, and type safety rules. Use when writing TypeScript code, defining types, or reviewing TypeScript projects. Includes generics, utility types, and best practices.
35collaboration
Multi-agent communication, task delegation, and coordination patterns. Use when working with multiple agents or complex collaborative workflows.
27solid
SOLID, DRY, KISS, and clean code principles for TypeScript applications. Use when designing scalable architecture, writing maintainable code, or reviewing code quality.
25security
Security best practices for web applications. Use when handling user input, authentication, or sensitive data. Covers XSS, SQL injection, CSRF, environment variables, and secure coding patterns.
22memory
Working memory management, context prioritization, and knowledge retention patterns for AI agents. Use when you need to maintain relevant context and avoid information loss during long tasks.
22