containerization-best-practices
Containerization & Docker Best Practices
Production-grade Docker and containerization strategies for building efficient, secure, and maintainable containers.
Dockerfile Best Practices
Layer Ordering
Order instructions from least to most frequently changing. System deps first, then app deps, then source code. Each instruction creates a layer. Docker caches layers top-down and invalidates everything below a changed layer.
FROM node:20-alpine
RUN apk add --no-cache tini curl
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
COPY . .
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
Multi-Stage Builds
Separate build-time dependencies from runtime. Only copy artifacts you need.
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine AS runtime
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY /app/dist ./dist
COPY /app/node_modules ./node_modules
USER app
CMD ["node", "dist/index.js"]
For statically linked binaries (Go, Rust), use scratch or gcr.io/distroless/static-debian12 as the final stage for minimal images (~2MB).
.dockerignore
Reduces build context size and prevents secrets from leaking into images.
.git
node_modules
.env
.env.*
*.md
coverage
tests
__pycache__
.venv
docker-compose*.yml
Dockerfile*
Base Image Selection
| Image | Size | Shell | Use Case |
|---|---|---|---|
node:20 |
~350MB | Yes | Avoid in prod |
node:20-slim |
~200MB | Yes | Good default |
node:20-alpine |
~50MB | Yes | Best for most apps |
distroless |
~2-20MB | No | Maximum security |
Pin Versions
Never use latest in production. Pin to patch version, or digest for full reproducibility.
FROM node:20.11.1-alpine3.19
FROM node:20.11.1-alpine3.19@sha256:abcdef1234...
Security Hardening
Non-Root User
# Alpine
RUN addgroup -S app && adduser -S -G app -s /sbin/nologin app
# Debian
RUN groupadd -r app && useradd -r -g app -s /usr/sbin/nologin -M app
COPY . .
USER app
Read-Only Filesystem
docker run --read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
-v appdata:/app/data \
myapp:latest
No Secrets in Images
Secrets in Dockerfile instructions persist in layer history.
# WRONG
ENV API_KEY=sk-secret-key
COPY .env /app/.env
# RIGHT - BuildKit secrets
RUN npm ci
# RIGHT - runtime injection
# docker run -e DB_PASS="$(vault read -field=password secret/db)" myapp
Image Scanning
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:latest
grype myapp:latest --fail-on high
docker scout cves myapp:latest
Drop Capabilities
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp:latest
Layer Caching Optimization
Dependency Install Before Code Copy
Copy manifests first, install, then copy source. This is the single most impactful caching rule.
# Python
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Go
COPY go.mod go.sum ./
RUN go mod download
COPY . .
BuildKit Cache Mounts
Persist package manager caches across builds.
RUN \
apt-get update && apt-get install -y --no-install-recommends curl
RUN \
pip install -r requirements.txt
RUN \
npm ci --omit=dev
Enable BuildKit: export DOCKER_BUILDKIT=1
Health Checks
# HTTP check
HEALTHCHECK \
CMD wget -qO- http://localhost:3000/health || exit 1
# TCP check (no curl/wget needed)
HEALTHCHECK \
CMD nc -z localhost 3000 || exit 1
Compose Health Checks with Dependencies
services:
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
app:
build: .
depends_on:
db:
condition: service_healthy
Container Networking
# Bridge - isolated network, containers resolve by name
docker network create app-net
docker run --network app-net --name api myapp
docker run --network app-net --name worker myworker
# worker reaches api at http://api:3000
# Host - shares host network stack, no port mapping needed
docker run --network host myapp
# Overlay - multi-host (Swarm)
docker network create --driver overlay --attachable cluster-net
Containers on user-defined bridge networks get DNS resolution by container name. The default bridge network does not provide this.
Volume Management
# Named volumes - Docker-managed, persistent data
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data postgres:16-alpine
# Bind mounts - host directory, good for dev
docker run -v "$(pwd)/src":/app/src:ro myapp
# tmpfs - in-memory, good for secrets/scratch
docker run --tmpfs /tmp:rw,noexec,nosuid,size=128m myapp
# Backup a volume
docker run --rm -v pgdata:/data -v "$(pwd)":/backup \
alpine tar czf /backup/pgdata-backup.tar.gz -C /data .
Logging Best Practices
Applications should write to stdout/stderr, never to files inside the container.
RUN ln -sf /dev/stdout /var/log/app.log \
&& ln -sf /dev/stderr /var/log/app-error.log
# JSON file driver with rotation (default)
docker run --log-driver json-file \
--log-opt max-size=10m --log-opt max-file=3 myapp
# Syslog driver
docker run --log-driver syslog \
--log-opt syslog-address=udp://loghost:514 myapp
Set defaults in daemon.json: { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "5" } }
Resource Limits
docker run -m 512m --cpus=1.0 --pids-limit=256 myapp
# Memory with swap
docker run -m 512m --memory-swap 1g myapp
# CPU shares (relative weight, default 1024)
docker run --cpu-shares=512 myapp
Compose Resource Limits
services:
app:
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
ulimits:
nofile:
soft: 65536
hard: 65536
Debugging Containers
docker exec -it <container> sh # shell into running container
docker logs -f --timestamps --tail 100 <ctr> # follow logs
docker inspect <container> # full config/state/network
docker inspect --format='{{.State.Health.Status}}' <ctr>
docker stats <container> # live resource usage
docker history --no-trunc myapp:latest # image layer sizes
docker cp <ctr>:/app/error.log ./error.log # copy files out
docker run -it --entrypoint sh myapp:latest # debug crashed container
docker run --rm --network container:<ctr> nicolaka/netshoot # network debug
Production Patterns
Graceful Shutdown and Signal Handling
Containers receive SIGTERM on stop. The app must handle it or Docker sends SIGKILL after the grace period (default 10s). Always use exec form so the app is PID 1.
CMD ["node", "server.js"]
# NOT: CMD node server.js (wraps in /bin/sh, swallows signals)
process.on('SIGTERM', () => {
server.close(() => {
db.disconnect().then(() => process.exit(0));
});
setTimeout(() => process.exit(1), 10000);
});
Init System with Tini
Use tini as PID 1 to reap zombie processes and forward signals.
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
Or: docker run --init myapp:latest
Stop Grace Period
services:
app:
stop_grace_period: 30s
Image Size Reduction
- Multi-stage builds to exclude compilers and build tools
- Alpine or distroless base images
- Combine RUN commands to reduce layers
- Remove caches in the same layer they are created
- Use
--no-install-recommends(apt) or--no-cache(apk)
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl ca-certificates \
&& rm -rf /var/lib/apt/lists/*
docker history myapp:latest # layer sizes
dive myapp:latest # interactive analysis
docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}"
Container Orchestration Considerations
Labels and Metadata
LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.version="1.2.3"
Docker Compose Production Stack
services:
app:
build:
context: .
target: runtime
restart: unless-stopped
environment:
- NODE_ENV=production
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
retries: 3
networks:
- backend
redis:
image: redis:7-alpine
volumes:
- redisdata:/data
networks:
- backend
volumes:
redisdata:
networks:
backend:
Image Registry
docker tag myapp:latest registry.example.com/myapp:1.2.3
docker tag myapp:latest registry.example.com/myapp:${GIT_SHA:0:8}
cosign sign --key cosign.key registry.example.com/myapp:1.2.3
docker image prune -a --filter "until=168h"
References
- Docker Best Practices: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
- Distroless Images: https://github.com/GoogleContainerTools/distroless
- Trivy Scanner: https://github.com/aquasecurity/trivy
- Tini Init: https://github.com/krallin/tini
More from 1mangesh1/dev-skills-collection
curl-http
HTTP request construction and API testing with curl and HTTPie. Use when user asks to "test API", "make HTTP request", "curl POST", "send request", "test endpoint", "debug API", "upload file", "check response time", "set auth header", "basic auth with curl", "send JSON", "test webhook", "check status code", "follow redirects", "rate limit testing", "measure API latency", "stress test endpoint", "mock API response", or any HTTP calls from the command line.
28database-indexing
Database indexing internals, index type selection, query plan analysis, and write-overhead tradeoffs across PostgreSQL, MySQL, and MongoDB. Use when user asks to "optimize queries", "create indexes", "fix slow queries", "read EXPLAIN output", "reduce query time", "index strategy", "database performance", "composite index", "covering index", "partial index", "index bloat", "unused indexes", or needs help diagnosing and resolving database performance problems.
13testing-strategies
Testing strategies, patterns, and methodologies across the full testing spectrum. Use when asked about unit tests, integration tests, e2e tests, test pyramid, mocking, test doubles, TDD, property-based testing, snapshot testing, test coverage, mutation testing, contract testing, performance testing, test data management, CI/CD testing, flaky tests, test anti-patterns, test organization, test isolation, test fixtures, test parameterization, or any testing strategy, approach, or methodology.
10secret-scanner
This skill should be used when the user asks to "scan for secrets", "find API keys", "detect credentials", "check for hardcoded passwords", "find leaked tokens", "scan for sensitive keys", "check git history for secrets", "audit repository for credentials", or mentions secret detection, credential scanning, API key exposure, token leakage, password detection, or security key auditing.
10terraform
Terraform infrastructure as code for provisioning, modules, state management, and workspaces. Use when user asks to "create infrastructure", "write Terraform", "manage state", "create module", "import resource", "plan changes", or any IaC tasks.
10kubernetes
Kubernetes and kubectl mastery for deployments, services, pods, debugging, and cluster management. Use when user asks to "deploy to k8s", "create deployment", "debug pod", "kubectl commands", "scale service", "check pod logs", "create ingress", or any Kubernetes tasks.
10