kubernetes
Kubernetes
Pod Failure Troubleshooting
| Status | Common Causes | Debug Steps |
|---|---|---|
| CrashLoopBackOff | App crash, bad entrypoint, missing deps | kubectl logs <pod> --previous |
| ImagePullBackOff | Wrong image/tag, no auth, registry down | Check image name, kubectl get events |
| Pending | No resources, node selector mismatch, PVC pending | kubectl describe pod, check node capacity |
| OOMKilled | Memory limit exceeded | Increase limits.memory or fix leak |
| Evicted | Node disk/memory pressure | Check node conditions, clean up |
| CreateContainerError | Bad securityContext, missing configmap/secret | kubectl describe pod for specific error |
Resource Configuration Gotchas
Requests vs Limits
- Requests: Scheduling guarantee. Pod won't schedule if node lacks capacity.
- Limits: Hard ceiling. Container killed (OOM) or throttled (CPU) if exceeded.
- No limits = unbounded (can consume entire node)
requests>limitsis invalid
Probe Timing
livenessProbe:
initialDelaySeconds: 10 # Wait before first check
periodSeconds: 5 # Check interval
timeoutSeconds: 1 # Max wait for response
failureThreshold: 3 # Failures before action
- Liveness failure → container restart
- Readiness failure → removed from service endpoints
- StartupProbe disables other probes until success (use for slow-starting apps)
Security Context Inheritance
Pod-level securityContext applies to all containers but container-level overrides it:
spec:
securityContext:
runAsNonRoot: true # Pod default
containers:
- securityContext:
runAsUser: 1000 # Container override
RBAC Patterns
Minimal Role for Pod Logs
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
Common API Groups
""(empty): Core resources (pods, services, configmaps)apps: Deployments, StatefulSets, DaemonSetsnetworking.k8s.io: Ingress, NetworkPolicyrbac.authorization.k8s.io: Roles, bindings
NetworkPolicy Gotchas
- No NetworkPolicy = all traffic allowed
- Any NetworkPolicy selecting a pod = default deny for that direction
- Empty
podSelector: {}selects all pods in namespace namespaceSelector: {}selects all namespaces- Combine selectors with
-(OR) vs nested (AND)
ingress:
- from:
- podSelector: {matchLabels: {app: frontend}} # AND
namespaceSelector: {matchLabels: {env: prod}}
- from: # OR (separate rule)
- podSelector: {matchLabels: {app: monitoring}}
More from kontrolplane/skills
kyverno
Kyverno Kubernetes policy engine for validation, mutation, and generation. Use when writing ClusterPolicies to enforce security standards, auto-mutate resources with defaults, generate companion resources, or verify container image signatures.
12prometheus
Prometheus metrics and PromQL queries. Use when writing PromQL queries, creating recording or alerting rules, debugging metric scraping issues, or understanding counter/gauge/histogram behavior.
4loki
Grafana Loki log aggregation and LogQL queries. Use when writing LogQL queries for log analysis, configuring Promtail scrape pipelines, debugging log ingestion issues, or creating Loki alerting rules.
3argocd
ArgoCD GitOps continuous delivery for Kubernetes. Use when creating or debugging ArgoCD Application/ApplicationSet manifests, configuring sync policies, troubleshooting OutOfSync or degraded states, or integrating Helm/Kustomize sources.
3grafana
Grafana dashboard JSON configuration and alerting. Use when creating or editing dashboard JSON, configuring panels programmatically, setting up Grafana alerting rules, or troubleshooting visualization issues.
3terraform
Terraform infrastructure as code with HCL. Use when writing Terraform configurations, debugging state issues, understanding count vs for_each behavior, managing modules, or troubleshooting plan/apply errors.
3