kubespray-operations
Kubespray Operations
Overview
Kubespray provides playbooks for cluster lifecycle operations: upgrades, scaling, and reset. Understanding these operations prevents data loss and service disruption.
Core principle: Always backup etcd before destructive operations. Kubernetes upgrades must go one minor version at a time.
When to Use
- Upgrading Kubernetes versions (patch, minor, or major with Kubespray version bump)
- Adding new worker or control plane nodes
- Removing nodes from cluster (healthy or unreachable)
- Backing up and restoring etcd
- Resetting cluster to clean state
Not for: Initial deployment (use kubespray-deployment), troubleshooting failures (use kubespray-troubleshooting), certificate issues (use kubespray-certificates)
Node Management
Adding a Worker Node (scale.yml)
What scale.yml does: download binaries -> install kubelet -> upload control plane certs -> kubeadm join -> apply labels/taints -> configure CNI
IMPORTANT: scale.yml is ONLY for worker nodes. Do NOT use it for control plane nodes. Control plane nodes require cluster.yml because they need etcd membership, static pod generation, certificate creation, and kubeadm control plane join.
Step-by-step:
- Update inventory -- add the new node to
[all]and[kube_node]:
[all]
# ... existing nodes ...
k8s-node5 ansible_host=192.168.10.25 ip=192.168.10.25 # new
[kube_node]
k8s-node1
k8s-node2
k8s-node3
k8s-node4
k8s-node5 # new
- Run scale playbook with
--limittargeting only the new node:
ansible-playbook scale.yml --become \
-i inventory/mycluster/inventory.ini \
--limit=k8s-node5 \
-e kube_version="1.32.9"
- Verify:
kubectl get nodes
# k8s-node5 should appear as Ready
Removing a Worker Node (remove-node.yml)
PDB considerations: If any PodDisruptionBudget has maxUnavailable: 0, draining will block indefinitely. Always audit PDBs before removing a node:
kubectl get pdb --all-namespaces
What remove-node.yml does: confirmation prompt -> cordon and drain -> remove etcd member (if applicable) -> kubeadm reset -> delete Node object from API server
Command:
ansible-playbook remove-node.yml --become \
-i inventory/mycluster/inventory.ini \
-e node=k8s-node5 \
-e skip_confirmation=true
After removal: Update inventory.ini to remove the node entry. Kubespray does not modify your inventory file automatically.
Force-Removing Unhealthy Nodes
When a node is unreachable (hardware crash, network partition), normal remove-node.yml FAILS with UNREACHABLE because it tries to SSH into the dead node to run kubeadm reset.
Use these extra variables to force removal:
ansible-playbook remove-node.yml --become \
-i inventory/mycluster/inventory.ini \
-e node=k8s-node5 \
-e skip_confirmation=true \
-e reset_nodes=false \
-e allow_ungraceful_removal=true
What happens with these flags:
reset_nodes=false-- skips SSH to the dead node, skips kubeadm resetallow_ungraceful_removal=true-- skips drain (node is unreachable anyway), removes only cluster-side metadata (Node object, etcd member if applicable)
WARNING: If the dead node comes back online, kubelet will attempt to re-register with old certificates. You must wipe the node (kubeadm reset) before rejoining it to the cluster.
Replacing a Control Plane Node
This is the most complex node operation because it involves etcd membership changes.
Step 1: Remove the old control plane node
ansible-playbook remove-node.yml --become \
-i inventory/mycluster/inventory.ini \
-e node=k8s-ctr2 \
-e skip_confirmation=true
This takes etcd from 3 members to 2 members. Minimize this window -- proceed to Step 2 and 3 promptly.
Step 2: Update inventory
CRITICAL: Add the new control plane node at the END of [kube_control_plane]. Never insert it in the middle -- Kubespray uses the ordering for etcd initial cluster membership and the first node has special significance.
[kube_control_plane]
k8s-ctr1 # existing - MUST stay first
k8s-ctr3 # existing
k8s-ctr-new # new - added at END
Step 3: Run cluster.yml (NOT scale.yml!)
Control plane nodes need cluster.yml because they require:
- etcd member join
- Static pod manifest generation (kube-apiserver, kube-controller-manager, kube-scheduler)
- Full certificate generation
- kubeadm control plane join (not just worker join)
ansible-playbook cluster.yml --become \
-i inventory/mycluster/inventory.ini
CRITICAL LIMITATION: The first node listed in [kube_control_plane] CANNOT be removed via remove-node.yml. This node is the initial etcd bootstrap node and has special handling throughout Kubespray. To replace it, you must rebuild the cluster or use manual etcd membership manipulation (advanced, not covered by standard playbooks).
Verify replacement:
# etcd member list should show 3 members again
ETCDCTL_API=3 etcdctl member list \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/admin-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/admin-$(hostname)-key.pem \
--endpoints=https://127.0.0.1:2379
# All static pods running on new CP node
kubectl get pods -n kube-system -o wide | grep k8s-ctr-new
# nginx.conf on workers updated to include new CP endpoint
# (Kubespray uses nginx as LB on worker nodes for API server access)
Node Management Key Takeaways
| Rule | Detail |
|---|---|
scale.yml for workers only |
Control plane nodes require cluster.yml |
New CP nodes at END of [kube_control_plane] |
Never insert in the middle of the group |
| First CP node cannot be removed | It is the etcd bootstrap node with special handling |
| Unreachable nodes | Use reset_nodes=false + allow_ungraceful_removal=true |
| PDBs can block drains | Audit with kubectl get pdb --all-namespaces before removal |
| Keep inventory.ini in sync | Kubespray does not auto-update your inventory after removal |
Cluster Upgrades
Version Skew Policy
Kubernetes enforces strict upgrade rules -- can only upgrade one minor version at a time:
v1.X -> v1.X+1 -> v1.X+2 (one at a time)
v1.X -> v1.X+3 (cannot skip)
Check supported versions: https://kubernetes.io/releases/
Pre-Upgrade: Flannel CNI Plugin Update
If using Flannel, update the CNI plugin BEFORE upgrading Kubernetes while all nodes are still on the same version. The Flannel DaemonSet update rolls out to all nodes at once -- you cannot do it per-node.
# Check current Flannel version
kubectl get daemonset kube-flannel -n kube-flannel \
-o jsonpath='{.spec.template.spec.containers[0].image}'
Update the Flannel version in your Kubespray group_vars before proceeding with the Kubernetes upgrade.
Pre-Upgrade Checklist
# 1. Check current versions
kubectl get nodes -o wide
# 2. Verify cluster health
kubectl get nodes
kubectl get pods -A | grep -v Running | grep -v Completed
# 3. Audit PDBs that could block drains
kubectl get pdb --all-namespaces -o wide
# 4. Backup etcd (CRITICAL)
ETCDCTL_API=3 etcdctl snapshot save /backup/pre-upgrade.db \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/admin-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/admin-$(hostname)-key.pem \
--endpoints=https://127.0.0.1:2379
# 5. Verify backup
ETCDCTL_API=3 etcdctl snapshot status /backup/pre-upgrade.db
Upgrade Strategies
Strategy 1: Unsafe (cluster.yml) -- Dev/Test Only
Uses cluster.yml with upgrade flag. No cordon, no drain -- workloads may be disrupted.
ansible-playbook cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
-e upgrade_cluster_setup=true
Only use this for development clusters where downtime is acceptable.
Strategy 2: Graceful (upgrade-cluster.yml) -- Production
Rolling per-node upgrade: cordon -> drain -> upgrade -> uncordon. This is the recommended approach.
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10"
Serial control options:
serial: 1-- upgrades one node at a time (safest, slowest)serial: "20%"-- upgrades 20% of nodes at a time (default)
Manual confirmation between nodes:
# Pause and wait for operator confirmation before each node
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
-e upgrade_node_confirm=true
Automatic pause between nodes:
# Wait 60 seconds between nodes automatically
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
-e upgrade_node_pause_seconds=60
Patch Upgrade (e.g., v1.32.9 -> v1.32.10)
The simplest upgrade type. Only Kubernetes binaries change. Same Kubespray version.
Step 1: Upgrade control plane and etcd first
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
--limit "kube_control_plane:etcd"
What happens per control plane node:
- Pre-upgrade: downloads new binaries and images
- Cordon the node
- Drain workloads
- First CP node:
kubeadm upgrade apply v1.32.10 - Subsequent CP nodes:
kubeadm upgrade node - kube-proxy DaemonSet updated
- Uncordon the node
Step 2: Upgrade workers individually
# Upgrade one worker at a time
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
--limit "k8s-node4"
# Then the next worker
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.32.10" \
--limit "k8s-node5"
Post-upgrade verification:
# All nodes on new version
kubectl get nodes -o wide
# Static pod images updated on CP nodes
kubectl get pods -n kube-system -o wide | grep -E 'apiserver|controller|scheduler'
# kube-proxy image updated
kubectl get daemonset kube-proxy -n kube-system \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# etcd cluster healthy
ETCDCTL_API=3 etcdctl endpoint health \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/admin-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/admin-$(hostname)-key.pem \
--endpoints=https://127.0.0.1:2379
# API server endpoints responding
kubectl get --raw='/readyz?verbose'
Minor Upgrade (e.g., v1.32.10 -> v1.33.7)
Same procedure as a patch upgrade, but with additional considerations:
- Version skew policy: Only one minor version jump at a time. v1.32 -> v1.33 is valid. v1.32 -> v1.34 is NOT.
- Longer duration: More container images to pull, more component checks
- Update admin kubectl: After upgrade, update the kubectl binary on your workstation to match the new cluster minor version
- Check deprecated API usage before upgrading:
kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis
APIs deprecated in the current version may be removed in the next minor version. Fix any usage before upgrading.
The upgrade command is the same -- just set the target version:
# CP + etcd first
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.33.7" \
--limit "kube_control_plane:etcd"
# Then workers individually
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.33.7" \
--limit "k8s-node4"
Major Upgrade with Kubespray Version Bump (e.g., v1.33.7 -> v1.34.3)
When the target Kubernetes version requires a newer Kubespray release, you must upgrade Kubespray itself first.
Step 1: Update Kubespray
cd /path/to/kubespray
git fetch --all --tags
git checkout v2.30.0
Check the release notes for:
- Supported Kubernetes version range
- Component version changes (etcd, containerd, CNI plugins)
- Breaking changes in variable names or defaults
Step 2: Update Python dependencies
Use a virtual environment for dependency isolation:
python3 -m venv kubespray-venv
source kubespray-venv/bin/activate
pip3 install -r requirements.txt
Step 3: Review component upgrades
A Kubespray version bump may also upgrade:
- etcd (e.g., 3.5.25 -> 3.5.26): Automatic during upgrade, per-member rolling restart, backups stored in
/var/backups/ - containerd (e.g., 2.1.5 -> 2.2.1): Automatic during upgrade, binary replacement + service restart
These happen transparently as part of the upgrade playbook.
Step 4: Run the full upgrade
# CP + etcd first
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.34.3" \
--limit "kube_control_plane:etcd"
# Then workers
ansible-playbook upgrade-cluster.yml --become \
-i inventory/mycluster/inventory.ini \
-e kube_version="1.34.3" \
--limit "k8s-node4"
Step 5: Post-upgrade tasks
# Update kubectl on your workstation to match cluster version
# Update Helm if needed
# Refresh kubeconfig if certificate contents changed
cp /etc/kubernetes/admin.conf ~/.kube/config
# Verify HAProxy/nginx LB backends are healthy (if using external LB)
# Check Prometheus targets if monitoring is deployed
# (scrape endpoints may have changed with component upgrades)
# Full verification
kubectl get nodes -o wide
kubectl get pods -A | grep -v Running | grep -v Completed
kubectl get --raw='/readyz?verbose'
Upgrade Order Summary
Kubespray upgrade-cluster.yml upgrades in this sequence:
- etcd (if new version needed)
- Control plane nodes (one at a time by default)
- Worker nodes (one at a time by default)
- CNI plugin
- Addons
For multi-hop upgrades (e.g., 1.31 -> 1.34), run the full upgrade process three times:
# v1.31 -> v1.32
ansible-playbook upgrade-cluster.yml ... -e kube_version="1.32.x"
# verify, then v1.32 -> v1.33
ansible-playbook upgrade-cluster.yml ... -e kube_version="1.33.x"
# verify, then v1.33 -> v1.34
ansible-playbook upgrade-cluster.yml ... -e kube_version="1.34.x"
etcd Backup and Restore
Creating Backups
#!/bin/bash
# etcd-backup.sh
BACKUP_DIR="/backup/etcd"
DATE=$(date +%Y%m%d-%H%M%S)
SNAPSHOT="$BACKUP_DIR/etcd-snapshot-$DATE.db"
mkdir -p "$BACKUP_DIR"
ETCDCTL_API=3 etcdctl snapshot save "$SNAPSHOT" \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/admin-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/admin-$(hostname)-key.pem \
--endpoints=https://127.0.0.1:2379
# Verify and cleanup old backups
if [ $? -eq 0 ]; then
echo "Backup successful: $SNAPSHOT"
find "$BACKUP_DIR" -name "*.db" -mtime +7 -delete
else
echo "Backup failed!"
exit 1
fi
Automated Backup (systemd)
# /etc/systemd/system/etcd-backup.service
[Unit]
Description=etcd backup
After=etcd.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/etcd-backup.sh
# /etc/systemd/system/etcd-backup.timer
[Unit]
Description=Daily etcd backup
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
systemctl enable etcd-backup.timer
systemctl start etcd-backup.timer
Restoring from Backup (Single Node)
# 1. Stop etcd
systemctl stop etcd
# 2. Backup current data (just in case)
mv /var/lib/etcd /var/lib/etcd.broken
# 3. Restore snapshot
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd
# 4. Fix ownership
chown -R etcd:etcd /var/lib/etcd
# 5. Start etcd
systemctl start etcd
# 6. Verify
ETCDCTL_API=3 etcdctl endpoint health ...
kubectl get nodes
Restoring Multi-Node Cluster
More complex -- each node needs restore with cluster membership info:
# On each etcd node:
ETCDCTL_API=3 etcdctl snapshot restore /backup/snapshot.db \
--data-dir=/var/lib/etcd \
--name=k8s-ctr1 \
--initial-cluster=k8s-ctr1=https://192.168.10.11:2380,k8s-ctr2=https://192.168.10.12:2380,k8s-ctr3=https://192.168.10.13:2380 \
--initial-cluster-token=etcd-cluster-restore \
--initial-advertise-peer-urls=https://192.168.10.11:2380
Use different --initial-cluster-token than original to prevent confusion.
Cluster Reset
Warning: Destroys all cluster data including etcd.
ansible-playbook -i inventory/mycluster/inventory.ini reset.yml -b
# Type "yes" when prompted
Use reset when:
- Deployment is corrupted beyond repair
- Starting fresh with new configuration
- Cleaning up test clusters
Playbook Reference
| Playbook | Purpose | When to Use |
|---|---|---|
cluster.yml |
Initial deployment or add CP nodes | New cluster, or adding control plane nodes |
upgrade-cluster.yml |
Graceful version upgrades | Production upgrades (cordon/drain/upgrade/uncordon) |
scale.yml |
Add worker nodes | Adding workers only (NOT for control plane) |
remove-node.yml |
Remove nodes | Removing any node from the cluster |
reset.yml |
Destroy cluster | Full cluster teardown |
recover-control-plane.yml |
Restore failed control plane | Control plane recovery after failure |
Common Errors (Searchable)
error: unable to upgrade connection: pod does not exist
Cause: Node drained but pods not fully terminated. Fix: Wait and retry drain.
The connection to the server was refused - did you specify the right host or port?
Cause: API server down during upgrade. Fix: Wait for control plane to recover.
etcdserver: mvcc: database space exceeded
Cause: etcd database full. Fix: Compact and defrag etcd before upgrade.
UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
Cause: Previous upgrade incomplete. Fix: Check Helm releases, clean up stuck releases.
cannot exec into a container in a completed pod
Cause: Pods terminated during drain. Fix: Normal during drain, continue.
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh"}
Cause: Node is down or network partitioned during remove-node.yml. Fix: Use -e reset_nodes=false -e allow_ungraceful_removal=true to force removal from cluster metadata only.
cannot evict pod as it would violate the pod's disruption budget
Cause: PDB with maxUnavailable: 0 blocks drain indefinitely. Fix: Audit PDBs with kubectl get pdb --all-namespaces, temporarily adjust or delete the blocking PDB, then retry.
Common Mistakes
| Mistake | Consequence |
|---|---|
| Skipping Kubernetes versions | Upgrade fails, potential cluster corruption |
| No etcd backup before upgrade | Cannot recover if upgrade fails |
| Removing etcd nodes without quorum check | Cluster becomes unavailable |
| Using reset.yml when scale would work | Unnecessary downtime and data loss |
Draining without --ignore-daemonsets |
Drain hangs waiting for DS pods |
| Using scale.yml for control plane nodes | CP node joins as worker, missing etcd and static pods |
| Inserting new CP node in middle of inventory | Breaks etcd membership ordering assumptions |
| Trying to remove first CP node via remove-node.yml | Operation not supported, cluster may break |
| Not updating inventory.ini after remove-node.yml | Next playbook run has stale node references |
| Ignoring PDB audit before node removal | Drain hangs indefinitely on maxUnavailable: 0 |
| Dead node rejoining without wipe after force removal | Old certs conflict, kubelet in bad state |