docker
Docker & Containers: Production Infrastructure
Write, review, and architect Dockerfiles, Compose stacks, and container workflows - from single-service dev setups to multi-arch production pipelines with image signing and compliance gates. The goal is minimal, secure, reproducible images that a team can maintain and a QSA can audit.
Target versions: May 2026 snapshot. Read references/target-versions.md before
pinning Docker, Compose, BuildKit, containerd, Podman, Buildah, or runc.
This skill covers Dockerfiles, Compose, container hardening, supply chain, registry/CI patterns, and runtime migration across Docker, Podman, Buildah, Skopeo, and containerd.
When to use
- Writing or reviewing Dockerfiles (single or multi-stage)
- Setting up Docker Compose stacks (dev, staging, production)
- Optimizing image size, build speed, or layer caching
- Hardening containers for production or compliance
- Setting up image signing, SBOM generation, or vulnerability scanning
- Containerizing AI/ML workloads (Model Runner, GPU passthrough, model serving)
- Migrating from Docker to Podman or building with Buildah
- Reviewing container security posture for PCI-DSS 4.0 or SOC 2
- Troubleshooting container networking, volume, or build issues
- Using
docker initto scaffold a new project
When NOT to use
- Kubernetes manifests, Helm charts, cluster architecture (use kubernetes)
- CI/CD pipeline design (use ci-cd)
- Security audits of application code (use security-audit)
- Infrastructure provisioning with Terraform (use terraform)
AI Self-Check
AI tools consistently produce the same Docker mistakes. Before returning any generated Dockerfile or Compose file, verify against this list:
- Multi-stage build used when the app has a build step (TypeScript, Go, Rust, Java, C/C++)
- Dependencies copied and installed BEFORE source code (layer caching)
- Final image is slim/distroless/scratch - no build tools, no package caches
-
USERdirective present - container does NOT run as root - No secrets in
ENV,ARG, orCOPY- use--mount=type=secretor runtime injection - Base image pinned to specific version or SHA256 digest (never
:latestexcept Chainguard free tier, never bare:22) -
HEALTHCHECKpresent for production images -
.dockerignoreexists and excludes.git,node_modules,.env,__pycache__, etc. - No
ADDfor local files (useCOPY-ADDauto-extracts and fetches URLs) - Compose: no
version:field (deprecated since Compose v2, removed in spec v5) - Compose:
depends_onusescondition: service_healthy, not bare ordering - Compose: resource limits set on production services
- Package caches cleaned in same layer:
--no-cache(apk),rm -rf /var/lib/apt/lists/*(apt). For pip: use--mount=type=cacheOR--no-cache-dir, not both. - CMD uses exec form (JSON array), not shell form:
CMD ["node", "app.js"]notCMD node app.js - HEALTHCHECK uses available tools: probe command uses a binary present in the final image (wget in Alpine, curl in Debian, none in scratch/distroless - use the app's own health endpoint)
- Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Engine/Compose syntax checked: Dockerfile, Compose, BuildKit, and runtime flags match the installed versions
- Image provenance considered: base images, registries, tags, SBOMs, and signatures are handled where risk warrants
Performance
- Order Dockerfile layers from stable to volatile and use cache mounts for package-manager caches where BuildKit is available.
- Keep build contexts small with
.dockerignore; accidental monorepo contexts dominate build time. - Use multi-stage builds and slim runtime images, but measure startup and debug needs before stripping tools aggressively.
Best Practices
- Pin base image digests for sensitive workloads and track rebuild cadence for security updates.
- Run as non-root and drop capabilities unless the workload genuinely needs them.
- Preview prune and volume-removal commands; persistent data must never be collateral cleanup.
Workflow
Step 1: Determine the domain
Based on the request:
- "Write a Dockerfile" / "containerize this app" -> Dockerfile
- "Set up docker compose" / "multi-service stack" -> Compose
- "Harden this" / "make PCI compliant" / "scan for vulnerabilities" -> Security
- "Sign images" / "generate SBOM" / "CI pipeline" -> Registry & CI
- "Use Podman" / "rootless containers" / "daemonless builds" -> Runtimes
- "Review this Dockerfile/compose" -> Apply production checklist + AI self-check
Step 2: Gather context
Before writing anything, determine:
- Application type: language, framework, build system
- Runtime: Bun, Node.js, Python, Go, Rust, Java - determines base image and build pattern
- Environment: dev (hot reload, debug) vs production (minimal, hardened)
- Base image: Alpine (small, musl) vs Debian-slim (glibc compat) vs distroless (no shell) vs Chainguard (zero-CVE) vs scratch (static binaries)
- Secrets: how are they injected? (env vars, mounted files, Docker secrets, vault)
- Compliance: PCI CDE? Regulated? What scanning/signing is required?
- Target registry: Docker Hub, GHCR, private registry, OCI-compliant?
- AI/ML: GPU workload? Model serving? Docker Model Runner?
Step 3: Build
Follow the domain-specific section below. Always apply the production checklist (Step 4) and AI self-check before finishing.
Step 4: Validate
# Dockerfile
docker build --no-cache -t test-build .
docker history test-build --format "{{.Size}}\t{{.CreatedBy}}" | head -15
docker scout quickview test-build # vulnerability overview
docker scout cves test-build # detailed CVE list
# Compose
docker compose config # validate and render
docker compose --dry-run up # dry-run startup (Compose v5)
# Security
docker scout cves --only-severity critical,high <image>
cosign verify --key <key> <image> # verify signature
syft <image> -o spdx-json # generate SBOM
grype <image> # vulnerability scan (alternative to Scout)
trivy image <image> # use v0.69.3 ONLY (v0.69.4-6 COMPROMISED)
Dockerfile
Read references/dockerfile-patterns.md for complete, production-ready Dockerfile templates (Node.js/Bun, Python, Go, Rust, static site) and BuildKit syntax reference.
Base image selection
- Need a shell or package manager: use slim Debian or Ubuntu bases.
- Need the smallest static runtime: use distroless or
scratch. - Need a hardened minimal userspace: use Chainguard or another verified Wolfi-style base.
- Keep builders and runtimes separate;
golang,rust, and other heavy toolchain images stay in build stages only.
See references/dockerfile-patterns.md for the actual language-by-language base recommendations and templates.
Key patterns
Multi-stage builds - the non-negotiable pattern for any compiled or transpiled language:
# syntax=docker/dockerfile:1
FROM node:22-slim AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN \
npm ci
COPY . .
RUN npm run build && npm prune --omit=dev
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY /app/dist ./dist
COPY /app/node_modules ./node_modules
USER 1001
EXPOSE 3000
HEALTHCHECK \
CMD ["node", "-e", "fetch('http://localhost:3000/health').then(r=>r.ok||process.exit(1)).catch(()=>process.exit(1))"]
CMD ["dist/index.js"]
BuildKit features (require # syntax=docker/dockerfile:1 or DOCKER_BUILDKIT=1):
- Cache mounts:
RUN --mount=type=cache,target=/root/.npm npm ci- persists package cache across builds, up to 70% faster rebuilds - Secret mounts:
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci- secrets never baked into layers - Heredocs: multi-line scripts without backslash hell
RUN <<EOF
set -e
apt-get update
apt-get install -y --no-install-recommends ca-certificates
rm -rf /var/lib/apt/lists/*
EOF
--checkflag:docker build --check .validates Dockerfile without building (dry-run lint)- SLSA provenance:
docker build --provenance=true --sbom=true .attaches attestations
docker init: scaffolds Dockerfile + compose.yaml + .dockerignore. Good starting point, always review.
What NOT to write
COPY . .before dependency install (busts cache on every source change)ADDfor local files (useCOPY-ADDis only for auto-extracting.tar.gzarchives into the image)MAINTAINER(deprecated - useLABEL maintainer="...")RUN cd /dir && ...(useWORKDIR /dir)- Separate
RUNfor each package install (chain with&&) chmod 777on anything- Secrets in
ARGorENV FROM node:latestorFROM python(unpinned)ENTRYPOINT+CMDtogether unless ENTRYPOINT is the binary and CMD is overridable default args (e.g.,ENTRYPOINT ["/app"]+CMD ["--config", "/etc/app.yaml"])
Compose
Read references/compose-patterns.md for complete Compose v5 templates (web+db, dev override, production hardened, AI/ML stack) and network/volume patterns.
Compose v5
- Do not use the old
version:field. - Expect Bake-based builds,
watch, dry-run validation, and newer model-oriented service wiring. - Keep dev and prod concerns separate; override files are still the sane default.
- Treat healthchecks and dependency readiness as normal Compose design, not optional extras.
Key patterns
Dev/prod separation with override files:
# compose.yaml (base)
services:
app:
image: myapp:1.0.0
healthcheck:
test: ["CMD-SHELL", "wget -qO /dev/null http://localhost:3000/health || exit 1"]
interval: 30s
timeout: 5s
start_period: 10s
retries: 3
# compose.override.yaml (dev - auto-loaded)
services:
app:
build: .
volumes:
- .:/app
- /app/node_modules
environment:
NODE_ENV: development
ports:
- "9229:9229"
command: bun run dev
Secrets: use top-level secrets: with file: or external: true, reference via _FILE env convention (e.g., POSTGRES_PASSWORD_FILE: /run/secrets/db_pass). Never hardcode secrets in environment:. See references/compose-patterns.md for the full template with secret wiring.
Health-gated dependencies: always use depends_on: with condition: service_healthy - bare depends_on is ordering only, no readiness guarantee.
Compose anti-patterns
version: "3.8"- dead field, remove itcontainer_nameon every service (breaksdocker compose up --scale)restart: alwayswithout healthcheck (infinite restart of broken containers)network_mode: hostwhen port mapping works (replace with a user-defined bridge network and explicitports:mapping)depends_onwithoutcondition:(ordering only, no readiness)volumes:mounting entire project dir in production (dev pattern leak)privileged: trueon a compose service instead of the host LXC- 20+ inline
environment:entries (useenv_file:)
Security
Read references/security-and-compliance.md for the full PCI-DSS 4.0 container requirements mapping, CVE reference, runtime hardening patterns, and scanning tool comparison.
Critical vulnerabilities (2025-2026)
| CVE | Component | Severity | Impact | Fixed in |
|---|---|---|---|---|
| CVE-2025-9074 | Docker Desktop | 9.3 Critical | Container escape via unauthenticated Engine API | Desktop 4.44.3 |
| CVE-2025-31133 | runc | High | Container escape via /dev/null symlink race | runc 1.2.8, 1.3.3, 1.4.0-rc.3 |
| CVE-2025-52565 | runc | High | Container escape via /dev/console mount race | runc 1.2.8, 1.3.3, 1.4.0-rc.3 |
| CVE-2025-52881 | runc | High | Host procfs writes via /proc redirect (DoS/escape) | runc 1.2.8, 1.3.3, 1.4.0-rc.3 |
| CVE-2026-33634 | Trivy | Critical | Supply chain - malware in Docker Hub images (v0.69.4-6) | Trivy v0.69.3 (safe) |
| CVE-2026-2664 | Docker Desktop | Medium | gRPC-FUSE kernel module OOB read | Desktop 4.62.0+ |
| CVE-2025-13743 | Docker Desktop | Low | Expired Hub PATs leaked in diagnostics bundles | Desktop 4.54.0 |
| CVE-2026-28400 | Model Runner | 7.5 High | Runtime flag injection - arbitrary file overwrite, container escape | Desktop 4.61.0+ |
| CVE-2026-33747 | BuildKit | High | Malicious frontend file escape outside storage root | BuildKit v0.28.1 |
| CVE-2026-33748 | BuildKit | High | Git URL validation bypass - restricted file access | BuildKit v0.28.1 |
Action items: upgrade runc to >= 1.4.0, BuildKit to >= 0.28.1, Docker Desktop to >= 4.66.1, never pull Trivy v0.69.4/5/6. Pin ALL CI tool images to SHA256 digests.
Hardened Compose baseline
Every production service should start from this, relax only what's needed:
services:
app:
image: myapp:1.0.0 # pinned, never :latest
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add: [] # add ONLY what's needed (see note below)
tmpfs:
- /tmp
user: "1001:1001"
healthcheck:
test: ["CMD-SHELL", "wget -qO /dev/null http://localhost:8080/health || exit 1"]
interval: 30s
timeout: 5s
start_period: 15s
retries: 3
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
cap_drop: ALL warning: many images (LSIO, HOTIO, official redis/postgres, anything using
gosu/setpriv/su-exec) start as root and drop privileges at runtime. They need at minimum
cap_add: ["SETUID", "SETGID"], and images that chown files at startup also need "CHOWN".
Always read the image's entrypoint to determine required capabilities before applying blanket
drops. Blind cap_drop: ALL with empty cap_add causes CrashLoopBackOff. See the
security-audit skill's "No blanket capability drops" rule for detailed guidance.
For a hardened Dockerfile pattern, see references/dockerfile-patterns.md (Language Templates section).
Supply chain security
- Sign images with cosign (Sigstore):
cosign sign --key cosign.key <image>@<digest> - Generate SBOMs at build time:
docker scout sbom <image>orsyft <image> -o spdx-json - Verify at deploy:
cosign verify --key cosign.pub <image>@<digest> - Pin CI tool images to SHA256 digests. Mutable tags are a proven attack vector (Trivy March 2026, tj-actions/reviewdog March 2025).
- Use Docker Scout or Grype for continuous vulnerability monitoring.
- Trivy: safe version is v0.69.3 ONLY. v0.69.4-6 contained credential-stealing malware. If any CI pipeline ran compromised Trivy between March 19-23, 2026, rotate ALL secrets.
PCI-DSS 4.0 container requirements (summary)
PCI-DSS 4.0 is the only active version. Key container-specific requirements:
- Req 1: Network segmentation - use user-defined bridge networks,
internal: truefor backend services, never expose CDE containers on default bridge - Req 2.2: Harden containers - non-root, drop caps, read-only rootfs, one process per container
- Req 4: Encrypt transmissions - TLS between CDE containers in Compose (mount certs, use TLS-enabled images, or front with a TLS-terminating reverse proxy)
- Req 5.2/5.3: Immutable images (deploy by digest), Falco for runtime detection
- Req 6.3: Vulnerability scanning on every image before deployment (Docker Scout, Grype, Trivy v0.69.3)
- Req 6.3.2: SBOM for every production image
- Req 8.6.2: No hardcoded secrets in images, compose files, or env vars
- Req 10: Audit logging - container stdout/stderr to immutable log store
- Req 11.5: Image digest pinning + Falco = FIM for containers
Full mapping in references/security-and-compliance.md.
Registry & CI
CI pipeline pattern
- Build with
docker build --checkfirst, thenbuildxwith provenance and SBOM output. - Scan before deploy, sign by digest, and keep an SBOM artifact even if the registry also stores one.
- Pin CI-side tools and actions to immutable digests, not mutable tags.
Docker Model Runner (AI/ML)
Docker Model Runner (Engine 29.x, Desktop 4.50+) serves AI models via OpenAI/Ollama-compatible APIs:
# Compose integration with Model Runner
services:
model:
provider:
type: model
options:
model: ai/llama3.2:3B-Q4_K_M
agent:
build: .
environment:
OPENAI_BASE_URL: http://model/v1
GPU containers: use deploy.resources.reservations.devices with capabilities: [gpu]. Start shm_size at 16gb for single GPU, 32gb for multi-GPU (vLLM needs shared memory for tensor ops). See references/compose-patterns.md for the full AI/ML stack template.
Production Checklist
Dockerfile
- Multi-stage build separating build and runtime stages
- Dependencies installed before source code (layer caching)
- Base image pinned to specific version or SHA256 digest
- Final image is slim/distroless/Chainguard (no build tools, no caches)
- Non-root
USERdirective (numeric UID preferred for K8s compat) -
HEALTHCHECKpresent -
.dockerignoreexcludes.git,node_modules,.env,__pycache__, etc. - No secrets in
ENV,ARG, or layers (use--mount=type=secret) -
WORKDIRset (not relying on default/) - No
ADDfor local files (useCOPY) - Package manager caches cleaned in same
RUNlayer -
# syntax=docker/dockerfile:1for BuildKit features
Compose
- No
version:field -
depends_onwithcondition: service_healthy - Healthchecks on every service
- Resource limits on production services
- Secrets via
secrets:orenv_file:, not inlineenvironment: - Separate override files for dev/prod
- Logging config with rotation (
max-size,max-file) - No
container_nameunless needed for external references -
restart: unless-stopped(oron-failure) with healthcheck (neveralwayswithout healthcheck) - Images pinned (no
:latest) -
read_only: true+no-new-privileges+cap_drop: ALLon production services
Security
- runc >= 1.4.0 (CVE-2025-31133/52565/52881 patched)
- BuildKit >= 0.28.1 (CVE-2026-33747/33748 patched)
- Docker Desktop >= 4.66.1 (CVE-2025-9074/CVE-2026-28400 patched)
- Trivy v0.69.3 ONLY (v0.69.4-6 COMPROMISED)
- Images signed with cosign, verified at deploy
- SBOM generated for every production image
- Vulnerability scanning in CI (Docker Scout, Grype, or Trivy v0.69.3)
- No
:latesttags in production (pin version or SHA256 digest) - CI tools pinned to SHA256 digests (not mutable tags)
- Base images rebuilt/updated regularly (weekly minimum)
Compliance (PCI-DSS 4.0)
- Containers run as non-root with minimal capabilities (Req 2.2)
- Read-only root filesystem, one process per container (Req 2.2)
- Images scanned for vulnerabilities before deployment (Req 6.3)
- SBOM generated for every CDE image (Req 6.3.2)
- No hardcoded secrets in images or compose files (Req 8.6.2)
- Image digests pinned for immutability (Req 5/11.5)
- Container logs shipped to immutable SIEM (Req 10)
- Runtime detection in place (Falco/Tetragon) (Req 5.2/5.3)
- Registry access audit-logged (Req 10.4.1.1)
- Base images from trusted, verified sources (Req 6.2.1)
Reference Files
references/dockerfile-patterns.md- Dockerfile templates and build patternsreferences/compose-patterns.md- Compose patterns and common stack layoutsreferences/security-and-compliance.md- container hardening, compliance guidance, and safe public custom image publishingreferences/alternative-runtimes.md- Podman, Buildah, Skopeo, and related runtime patternsreferences/target-versions.md- May 2026 version snapshot for Docker, Compose, BuildKit, containerd, Podman, Buildah, and runc
Output Contract
See skills/_shared/output-contract.md for the full contract.
- Skill name: DOCKER
- Deliverable bucket:
audits - Mode: conditional. When invoked to analyze, review, audit, or improve existing repo content, emit the full contract -- boxed inline header, body summary inline plus per-finding detail in the deliverable file, boxed conclusion, conclusion table -- and write the deliverable to
docs/local/audits/docker/<YYYY-MM-DD>-<slug>.md. When invoked to answer a question, teach a concept, build a new artifact, or generate content, respond freely without the contract. - Severity scale:
P0 | P1 | P2 | P3 | info(see shared contract; only used in audit/review mode).
Related Skills
- kubernetes - for deploying containers to K8s clusters. Docker builds the image; kubernetes deploys it. Dockerfile optimization belongs here; K8s manifests belong there.
- ci-cd - for pipeline design that builds and pushes images. Docker skill covers the Dockerfile and Compose patterns; ci-cd covers the pipeline stages around them.
- security-audit - for auditing container images, scanning for CVEs, and supply chain risks. Docker skill covers hardening best practices; security-audit runs the actual audit.
- ansible - can manage containers via
community.docker, but image building and Compose design belong here. - databases - for database containers in Docker Compose. Docker skill owns the Compose pattern; databases skill owns the engine tuning within the container.
- git - for git tags and version control. Docker skill handles container image tagging; git handles git tags and release workflows.
Rules
- No
:latesttags in production. Pin images to a specific version or SHA256 digest. - Multi-stage builds for compiled/transpiled languages. Build tools do not belong in production images.
- Non-root user. Every production container must run as non-root (numeric UID for K8s compat).
- No secrets in layers. Not in
ENV, not inARG, not inCOPY. Use--mount=type=secretor runtime injection. - Deps before source. Copy dependency manifests first, install, then copy source. Layer cache depends on it.
- Healthchecks on everything. Dockerfile
HEALTHCHECKand Composehealthcheck:. - Pin CI tools to SHA256 digests. Mutable tags are compromised supply chain vectors (Trivy CVE-2026-33634 March 2026, tj-actions CVE-2025-30066 (upstream: reviewdog CVE-2025-30154) March 2025).
- Trivy v0.69.3 only. v0.69.4-6 contained credential-stealing malware. If you ran it, rotate secrets.
- Compose: no
version:field. It's deprecated and removed. Just delete it. - Clean apt cache in the same RUN layer.
apt-get update && apt-get install -y ... && rm -rf /var/lib/apt/lists/*- all oneRUN. .dockerignoreis not optional..git,node_modules,.env, secrets, test fixtures, docs - all excluded.- Resource limits on production containers. Memory and CPU limits prevent noisy neighbors and OOM cascading.
- Run the AI self-check. Every generated Dockerfile/Compose gets verified against the checklist above before returning.