k8s
ACCESSING CLUSTERS
CRITICAL: Always prefix kubectl/flux commands with inline KUBECONFIG assignment. Do NOT use export or && - the variable must be set in the same command:
# CORRECT - inline assignment
KUBECONFIG=~/.kube/<cluster>.yaml kubectl get pods
# WRONG - export with && breaks in some shell contexts
export KUBECONFIG=~/.kube/<cluster>.yaml && kubectl get pods
Cluster Context
CRITICAL: Always confirm cluster before running commands.
| Cluster | Purpose | Kubeconfig |
|---|---|---|
dev |
Manual testing | ~/.kube/dev.yaml |
integration |
Automated testing | ~/.kube/integration.yaml |
live |
Production | ~/.kube/live.yaml |
KUBECONFIG=~/.kube/<cluster>.yaml kubectl <command>
Accessing Internal Services
Platform services are exposed through the internal ingress gateway over HTTPS. DNS URLs are useful for browser-based access (Grafana, Hubble UI, Longhorn UI).
OAuth2 Proxy caveat: Prometheus, Alertmanager, and some other services are behind OAuth2 Proxy. DNS URLs redirect to an OAuth login page and cannot be used for API queries via curl. Use kubectl exec or port-forward instead for programmatic access.
| Service | Live | Auth | API Access |
|---|---|---|---|
| Prometheus | https://prometheus.internal.tomnowak.work |
OAuth2 Proxy | kubectl exec or port-forward |
| Alertmanager | https://alertmanager.internal.tomnowak.work |
OAuth2 Proxy | kubectl exec or port-forward |
| Grafana | https://grafana.internal.tomnowak.work |
Built-in auth | Browser only |
| Hubble UI | https://hubble.internal.tomnowak.work |
None | Browser |
| Longhorn UI | https://longhorn.internal.tomnowak.work |
None | Browser |
| Garage Admin | https://garage.internal.tomnowak.work |
None | Browser |
Domain pattern: <service>.internal.<cluster-suffix>.tomnowak.work
- live:
internal.tomnowak.work - integration:
internal.integration.tomnowak.work - dev:
internal.dev.tomnowak.work
Querying Prometheus/Alertmanager API
# Option 1: kubectl exec (quick, no setup)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring prometheus-kube-prometheus-stack-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=up' | jq '.data.result'
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring prometheus-kube-prometheus-stack-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/alerts' | jq '.data.alerts[] | select(.state == "firing")'
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring alertmanager-kube-prometheus-stack-0 -c alertmanager -- \
wget -qO- 'http://localhost:9093/api/v2/alerts' | jq .
# Option 2: Port-forward (for scripts and repeated queries)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
curl -s "http://localhost:9090/api/v1/query?query=up" | jq '.data.result'
Using the helper scripts:
# Prometheus (start port-forward first; script defaults to http://localhost:9090)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
.claude/skills/prometheus/scripts/promql.sh alerts --firing
# Loki (no HTTPRoute — always requires port-forward)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/loki-headless 3100:3100 &
export LOKI_URL=http://localhost:3100
.claude/skills/loki/scripts/logql.sh tail '{namespace="monitoring"}' --since 15m
Common kubectl Patterns
Read-only commands used during daily operations and investigations:
| Command | Purpose |
|---|---|
kubectl get pods -n <ns> |
List pods in a namespace |
kubectl get pods -A |
List pods across all namespaces |
kubectl describe pod <pod> -n <ns> |
Detailed pod info with events |
kubectl logs <pod> -n <ns> --tail=100 |
Recent logs from a pod |
kubectl logs <pod> -n <ns> --previous |
Logs from previous container instance |
kubectl get events -n <ns> --sort-by='.lastTimestamp' |
Recent events timeline |
kubectl top pods -n <ns> |
CPU/memory usage per pod |
kubectl top nodes |
CPU/memory usage per node |
kubectl get ns <ns> --show-labels |
Namespace labels (network policy profiles) |
kubectl explain <resource> |
API schema reference for a resource type |
Flux GitOps Commands
Status and Reconciliation
# Check status
KUBECONFIG=~/.kube/<cluster>.yaml flux get all
KUBECONFIG=~/.kube/<cluster>.yaml flux get kustomizations
KUBECONFIG=~/.kube/<cluster>.yaml flux get helmreleases -A
# Trigger reconciliation
KUBECONFIG=~/.kube/<cluster>.yaml flux reconcile source git flux-system
KUBECONFIG=~/.kube/<cluster>.yaml flux reconcile kustomization <name>
KUBECONFIG=~/.kube/<cluster>.yaml flux reconcile helmrelease <name> -n <namespace>
Flux Status Interpretation
| Status | Meaning | Action |
|---|---|---|
Ready: True |
Resource is reconciled and healthy | None - operating normally |
Ready: False |
Resource failed to reconcile | Check the message/reason for details |
Stalled: True |
Resource has stopped retrying after repeated failures | Suspend/resume to reset (see sre skill) |
Suspended: True |
Resource is intentionally paused | Resume when ready: flux resume <type> <name> |
Reconciling |
Resource is actively being applied | Wait for completion |
Researching Unfamiliar Services
When investigating unknown services, spawn a haiku agent to research documentation:
Task tool:
- subagent_type: "general-purpose"
- model: "haiku"
- prompt: "Research [service] troubleshooting docs. Focus on:
1. Common failure modes
2. Health indicators
3. Configuration gotchas
Start with: [docs-url]"
Chart URL to Docs mapping:
| Chart Source | Documentation |
|---|---|
charts.jetstack.io |
cert-manager.io/docs |
charts.longhorn.io |
longhorn.io/docs |
grafana.github.io |
grafana.com/docs |
prometheus-community.github.io |
prometheus.io/docs |
Common Confusions
BAD: Use helm list to check Helm release status
GOOD: Use kubectl get helmrelease -A - Flux manages releases via CRDs, not Helm CLI
Keywords
kubernetes, kubectl, kubeconfig, flux, flux status, cluster access, internal URL, service URL, port-forward, helm release, gitops, reconciliation