Kubernetes Fundamentals

Use When

Use when starting with Kubernetes — core objects, kubectl workflow, manifests, probes, ingress, cluster setup (EKS/GKE/kind), and deciding when K8s is the right tool vs systemd, Docker Compose, ECS/Fargate, or PaaS.
The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.

Do Not Use When

The task is unrelated to kubernetes-fundamentals or would be better handled by a more specific companion skill.
The request only needs a trivial answer and none of this skill's constraints or references materially help.

Required Inputs

Gather relevant project context, constraints, and the concrete problem to solve; load references only as needed.
Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.

Workflow

Read this SKILL.md first, then load only the referenced deep-dive files that are necessary for the task.
Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.

Quality Standards

Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
Prefer deterministic, reviewable steps over vague advice or tool-specific magic.

Anti-Patterns

Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
Loading every reference file by default instead of using progressive disclosure.

Outputs

A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
References used, companion skills, or follow-up actions when they materially improve execution.

Evidence Produced

Category	Artifact	Format	Example
Correctness	Cluster setup decision record	Markdown doc per `skill-composition-standards/references/adr-template.md` covering EKS / GKE / kind pick and ingress controller	`docs/k8s/cluster-setup-adr.md`

References

Use the references/ directory for deep detail after reading the core workflow below.

Core concepts and the mental model before anything else. The trap with K8s is adopting it too early; the second trap is adopting it without the mental model.

When this skill applies

Evaluating whether to adopt K8s.
Starting a new cluster.
Writing your first manifests.
Onboarding engineers onto K8s.
Building a proof-of-concept deployment.

When K8s is the right tool

1 service, 1 VM, stable load                     -> systemd on Debian (use linux-security-hardening)
1-5 services, 1-2 hosts, simple scaling          -> Docker Compose (use cloud-architecture)
5-20 services, need scaling / self-heal / rollouts -> Kubernetes (managed: EKS / GKE / AKS)
Full managed PaaS acceptable, no K8s team        -> ECS Fargate / Cloud Run / Heroku / Railway / Render
Strict compliance, on-prem                       -> Self-hosted K8s

Rule: don't adopt K8s before you have enough services that orchestration + rollout + self-healing pays for the operational cost. If one person can deploy all your services with a shell script in 10 minutes, you don't need K8s yet.

See references/when-k8s-is-right.md.

Core objects — the mental model

Pod — one or more containers sharing network + storage. Never create Pods directly; use a controller.

Deployment — declarative Pod management. You say "I want 3 replicas of this image with these env vars"; K8s keeps that true.

Service — stable virtual IP + DNS name that load-balances across matching Pods. Pods come and go; Services stay.

Ingress — HTTP(S) routing rules from outside the cluster to Services. Requires an Ingress controller (nginx, Traefik).

ConfigMap / Secret — configuration decoupled from images. Mount as env vars or files.

Namespace — logical isolation. One per environment per tenant (or per team, per app).

PersistentVolume / PersistentVolumeClaim — storage lifecycle separated from Pods.

Job / CronJob — run-to-completion workloads.

StatefulSet — Pods with stable identity (databases, brokers).

DaemonSet — one Pod per node (agents, collectors).

See references/core-objects.md.

Minimal manifests (learn these by heart)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels: { app: api }
  template:
    metadata:
      labels: { app: api }
    spec:
      containers:
        - name: api
          image: myregistry/api:v1.2.3   # never :latest
          ports:
            - containerPort: 3000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef: { name: api-secrets, key: database-url }
          resources:
            requests: { cpu: 100m, memory: 128Mi }
            limits:   { cpu: 500m, memory: 256Mi }
          readinessProbe:
            httpGet: { path: /ready, port: 3000 }
            initialDelaySeconds: 3
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /live, port: 3000 }
            initialDelaySeconds: 15
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: production }
spec:
  type: ClusterIP
  selector: { app: api }
  ports: [{ port: 80, targetPort: 3000 }]

Four rules this manifest enforces:

Never :latest — always a pinned tag.
Always resources.requests and resources.limits — otherwise scheduling is unpredictable and OOM risk is high.
Always readiness probe — Service won't route to Pods not ready.
Liveness probe only for genuine stuck/crashed detection — misconfigured liveness causes restart loops.

Workload type — pick correctly

Stateless app, identical replicas, rolling update         -> Deployment
Stable identity, ordered start, per-Pod PV                -> StatefulSet
One Pod per node (logs, metrics, CNI, CSI)                -> DaemonSet
Run to completion (migration, batch)                      -> Job
Run on a schedule                                         -> CronJob
Dev/debug pods you'll throw away                          -> kubectl run (never in prod manifests)

Anti-pattern: Deployment for a database. You will lose data on the first reschedule unless you have a PVC + careful affinity, and even then identity churn breaks replication.

Debug triage — first 60 seconds

1. kubectl get pod <name> -o wide                       -> status, restarts, node
2. kubectl describe pod <name>                          -> read Events at the bottom FIRST
3. kubectl logs <name> -c <ctr> --previous              -> the run that crashed, not the new one
4. kubectl get events --sort-by=.lastTimestamp -n <ns>  -> recent cluster activity
5. Exit code in describe: 137 = OOMKilled, 143 = SIGTERM, 1 = app error

For symptom-by-symptom playbooks (CrashLoopBackOff, ImagePullBackOff, Pending, OOMKilled, Evicted, no endpoints, DNS, stuck rollouts, NotReady nodes) see references/debugging-recipes.md.

kubectl workflow

# Always use namespace context
kubectl config set-context --current --namespace=production

# Apply is declarative; use it
kubectl apply -f manifests/

# Debugging triad
kubectl get pods              # what exists
kubectl describe pod <name>   # why it's not healthy (events!)
kubectl logs <pod> -c <container> --tail=200 -f
kubectl exec -it <pod> -- sh  # into the container

# Rollouts
kubectl rollout status deployment/api
kubectl rollout undo deployment/api
kubectl rollout history deployment/api

# Resource usage
kubectl top pods
kubectl top nodes

Prefer kubectl apply -k (Kustomize) or Helm for real work — raw manifests are fine for learning. See references/kubectl-workflow.md.

Labels, selectors, annotations

Labels — identifying metadata used for selection. Services find Pods via label selectors. Use the recommended labels: app.kubernetes.io/name, app.kubernetes.io/instance, app.kubernetes.io/version, app.kubernetes.io/component, app.kubernetes.io/managed-by.

Annotations — non-identifying metadata (build number, commit SHA, owner email, change-cause).

Never rely on a Pod name; rely on labels.

Probes — get them right

Readiness probe: is this Pod ready to receive traffic? Failing = Service stops routing. Uses: DB connection, cache warm-up, dependency check.
Liveness probe: is this Pod stuck? Failing = kubelet restarts container. Uses: deadlock detection only. If you can't distinguish "stuck" from "slow under load," don't set a liveness probe.
Startup probe: for slow-starting apps. Disables liveness until startup succeeds.

Anti-pattern: making liveness call DB. DB blip → all Pods restart → cascade.

See references/probes-and-lifecycles.md.

Ingress controllers

nginx-ingress — most widely used. Reliable, feature-rich.
Traefik — great UX, dashboard, automatic Let's Encrypt.
Cloud LB-backed: AWS LB Controller (ALB), GCP GCE Ingress — tighter cloud integration.
Gateway API — newer, standard-track replacement for Ingress. Use if your controller supports it.

Pattern: one Ingress per app/tenant with TLS via cert-manager. See references/ingress-controllers.md.

Cluster options

Option	Use when
EKS (AWS)	You're on AWS; pay for control plane; add karpenter for cheap node scaling
GKE (GCP)	You're on GCP; best managed K8s experience; Autopilot for zero-node-ops
AKS (Azure)	Azure-first organisation
DigitalOcean / Linode LKE	Small team, cost-sensitive
kind / minikube / k3d	Local development and CI
k3s / k0s	Edge / on-prem, low-resource
kubeadm	Only if you must self-host; expect ops cost

See references/cluster-setup-eks-gke.md and references/local-kind-minikube.md.

Anti-patterns

:latest image tags.
No resource requests/limits.
Liveness probe hitting a dependency.
Secrets committed in Git in Secret manifests (use sealed-secrets, SOPS, or external-secrets).
hostPath volumes.
Privileged containers.
Missing PodDisruptionBudget on stateful workloads.
One giant namespace for everything.
kubectl edit for permanent changes — bypasses Git source of truth.
Long-running jobs as Deployments — use Job/CronJob.

References

references/when-k8s-is-right.md
references/core-objects.md
references/kubectl-workflow.md
references/probes-and-lifecycles.md
references/ingress-controllers.md
references/cluster-setup-eks-gke.md
references/local-kind-minikube.md
references/anti-patterns.md
references/debugging-recipes.md

kubernetes-fundamentals

Kubernetes Fundamentals

Use When

Do Not Use When

Required Inputs

Workflow

Quality Standards

Anti-Patterns

Outputs

Evidence Produced

References

When this skill applies

When K8s is the right tool

Core objects — the mental model

Minimal manifests (learn these by heart)

Workload type — pick correctly

Debug triage — first 60 seconds

kubectl workflow

Labels, selectors, annotations

Probes — get them right

Ingress controllers

Cluster options

Anti-patterns

Read next

References

More from peterbamuhigire/skills-web-dev

google-play-store-review

multi-tenant-saas-architecture

jetpack-compose-ui

gis-mapping

saas-accounting-system

manual-guide