skills/armanzeroeight/fastagent-plugins/kubernetes-best-practices

kubernetes-best-practices

SKILL.md

Kubernetes Best Practices

This skill provides guidance for writing production-ready Kubernetes manifests and managing cloud-native applications.

Resource Management

Memory: Set requests and limits to the same value to ensure QoS class and prevent OOM kills.

CPU: Set requests only, omit limits to allow performance bursting and avoid throttling.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    # No CPU limit

Image Versioning

Always pin specific versions, never use :latest tag unless explicitly requested:

# Good
image: nginx:1.25.3

# Bad
image: nginx:latest

For immutability, consider pinning to specific digests.

Configuration Management

Secrets: Sensitive data (passwords, tokens, certificates) ConfigMaps: Non-sensitive configuration (feature flags, URLs, settings)

env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: app-secrets
        key: database-url
  - name: LOG_LEVEL
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: log-level

Best practices:

  • Never hardcode secrets in manifests
  • Use external secret management (Sealed Secrets, External Secrets Operator)
  • Rotate secrets regularly
  • Limit access with RBAC

Workload Selection

Choose the appropriate workload type:

  • Deployment: Stateless applications (web servers, APIs, microservices)
  • StatefulSet: Stateful applications (databases, message queues)
  • DaemonSet: Node-level services (log collectors, monitoring agents)
  • Job/CronJob: Batch processing and scheduled tasks

Security Context

Always implement security best practices:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false

Security checklist:

  • Run as non-root user
  • Drop all capabilities by default
  • Use read-only root filesystem
  • Disable privilege escalation
  • Implement network policies
  • Scan images for vulnerabilities

Health Checks

Implement all three probe types:

Liveness: Restart container if unhealthy Readiness: Remove from service endpoints if not ready Startup: Allow slow-starting containers time to initialize

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  periodSeconds: 10
  failureThreshold: 30

High Availability

Replica counts: Set minimum 2 for production workloads

Pod Disruption Budgets: Maintain availability during voluntary disruptions

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

Additional HA considerations:

  • Use anti-affinity rules for pod distribution across nodes
  • Configure graceful shutdown periods
  • Implement horizontal pod autoscaling
  • Set appropriate resource requests for scheduling

Namespace Organization

Use namespaces for environment isolation and apply resource quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    persistentvolumeclaims: "10"

Benefits: Logical separation, resource limits, RBAC boundaries, cost tracking

Labels and Annotations

Use consistent, recommended labels:

metadata:
  labels:
    app.kubernetes.io/name: myapp
    app.kubernetes.io/instance: myapp-prod
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/component: backend
    app.kubernetes.io/part-of: ecommerce
    app.kubernetes.io/managed-by: helm

Service Types

  • ClusterIP: Internal cluster communication (default)
  • NodePort: External access via node ports (dev/test)
  • LoadBalancer: Cloud provider load balancer (production)
  • ExternalName: DNS CNAME record (external services)

Storage

Choose appropriate storage class and access mode:

Access Modes:

  • ReadWriteOnce (RWO): Single node read-write
  • ReadOnlyMany (ROX): Multiple nodes read-only
  • ReadWriteMany (RWX): Multiple nodes read-write
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi

Validation and Testing

Always validate before applying to production:

  1. Client-side validation: kubectl apply --dry-run=client -f manifest.yaml
  2. Server-side validation: kubectl apply --dry-run=server -f manifest.yaml
  3. Test in staging: Deploy to non-production environment first
  4. Monitor metrics: Watch resource usage and application health
  5. Gradual rollout: Use rolling updates with health checks

Application Checklist

When creating or reviewing Kubernetes manifests:

  • Resource requests and limits configured
  • Specific image version pinned (not :latest)
  • Secrets and ConfigMaps used for configuration
  • Security context implemented (non-root, dropped capabilities)
  • Health checks configured (liveness, readiness, startup)
  • Pod Disruption Budget defined for HA workloads
  • Consistent labels applied
  • Appropriate workload type selected
  • Namespace and resource quotas configured
  • Validated with dry-run before applying
Weekly Installs
7
GitHub Stars
26
First Seen
Feb 4, 2026
Installed on
claude-code7
opencode6
gemini-cli6
github-copilot6
codex6
cursor5