kubespray-lab-setup
Kubespray Lab Environment Setup
Overview
This skill covers building a local 6-node Vagrant/VirtualBox lab for testing Kubespray HA deployments. The environment includes an admin/load balancer node, three control plane nodes with stacked etcd, and two worker nodes running Rocky Linux 10.
Core principle: The admin-lb node acts as your deployment workstation, HAProxy load balancer, and NFS server. All Kubespray operations are executed from this node against the five K8s nodes over a host-only network.
When to Use
- Setting up a local Vagrant/VirtualBox lab for Kubespray testing
- Provisioning multi-node Rocky Linux clusters
- Configuring an admin node with HAProxy, NFS, and deployment tools
- Preparing K8s nodes with kernel modules and sysctl settings
- Testing HA configurations with a local load balancer
Not for: Running Kubespray playbooks (use kubespray-deployment), configuring HA variables (use kubespray-ha-configuration), troubleshooting cluster issues (use kubespray-troubleshooting)
Architecture Overview
┌──────────────────────────────────────────────────────────────────┐
│ Host Machine (macOS/Linux) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ VirtualBox Host-Only Network 192.168.10.0/24 │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ admin-lb │ 192.168.10.10 │ │
│ │ │ HAProxy │ API LB :6443 → CP nodes │ │
│ │ │ NFS Server │ Stats :9000 Prometheus :8405 │ │
│ │ │ Kubespray │ NFS /srv/nfs/share │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ k8s-node1 │ │ k8s-node2 │ │ k8s-node3 │ │ │
│ │ │ CP + etcd │ │ CP + etcd │ │ CP + etcd │ │ │
│ │ │ .10.11 │ │ .10.12 │ │ .10.13 │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ │ │
│ │ │ k8s-node4 │ │ k8s-node5 │ │ │
│ │ │ Worker │ │ Worker │ │ │
│ │ │ .10.14 │ │ .10.15 │ │ │
│ │ └────────────┘ └────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Each VM: enp0s3 (NAT, internet) + enp0s8/enp0s9 (Host-Only) │
└──────────────────────────────────────────────────────────────────┘
VM Specifications
| Node | IP | Role | CPU | Memory |
|---|---|---|---|---|
| admin-lb | 192.168.10.10 | HAProxy, NFS, Kubespray deployer | 2 | 1024 MB |
| k8s-node1 | 192.168.10.11 | Control Plane + etcd | 4 | 2048 MB |
| k8s-node2 | 192.168.10.12 | Control Plane + etcd | 4 | 2048 MB |
| k8s-node3 | 192.168.10.13 | Control Plane + etcd | 4 | 2048 MB |
| k8s-node4 | 192.168.10.14 | Worker | 4 | 2048 MB |
| k8s-node5 | 192.168.10.15 | Worker | 4 | 2048 MB |
Network interfaces:
enp0s3- NAT adapter for internet access (VirtualBox default)enp0s8/enp0s9- Host-Only adapter for cluster communication (192.168.10.0/24)
Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
BOX_IMAGE = "bento/rockylinux-10"
N = 5
Vagrant.configure("2") do |config|
# --- k8s-node1 through k8s-node5 ---
(1..N).each do |i|
config.vm.define "k8s-node#{i}" do |node|
node.vm.box = BOX_IMAGE
node.vm.hostname = "k8s-node#{i}"
node.vm.network "private_network", ip: "192.168.10.#{10 + i}"
node.vm.provider "virtualbox" do |vb|
vb.name = "k8s-node#{i}"
vb.memory = 2048
vb.cpus = 4
vb.linked_clone = true
vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
end
node.vm.provision "shell", path: "init_cfg.sh"
end
end
# --- admin-lb node ---
config.vm.define "admin-lb" do |admin|
admin.vm.box = BOX_IMAGE
admin.vm.hostname = "admin-lb"
admin.vm.network "private_network", ip: "192.168.10.10"
admin.vm.provider "virtualbox" do |vb|
vb.name = "admin-lb"
vb.memory = 1024
vb.cpus = 2
vb.linked_clone = true
vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
end
admin.vm.provision "shell", path: "admin-lb.sh"
end
end
Key settings:
linked_clone = true- saves disk space by sharing base imagenicpromisc2 allow-all- enables promiscuous mode on the second NIC for CNI traffic (Calico/Flannel)- K8s nodes are provisioned first so they are ready when admin-lb runs SSH key distribution
Admin-LB Bootstrap Script (admin-lb.sh)
#!/usr/bin/env bash
set -euo pipefail
echo "==============================="
echo " admin-lb bootstrap"
echo "==============================="
# ---- Timezone ----
timedatectl set-timezone Asia/Seoul
# ---- Disable Firewall & SELinux ----
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
# ---- SSH Configuration ----
sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config
sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
echo 'root:qwe123' | chpasswd
systemctl restart sshd
# ---- Local DNS (/etc/hosts) ----
cat >> /etc/hosts <<EOF
192.168.10.10 admin-lb
192.168.10.11 k8s-node1
192.168.10.12 k8s-node2
192.168.10.13 k8s-node3
192.168.10.14 k8s-node4
192.168.10.15 k8s-node5
EOF
# ---- HAProxy ----
dnf install -y haproxy
cat > /etc/haproxy/haproxy.cfg <<'HAPCFG'
global
log 127.0.0.1 local2
maxconn 4096
daemon
defaults
mode tcp
log global
option tcplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
frontend k8s-api
bind *:6443
default_backend k8s-api-backend
backend k8s-api-backend
balance roundrobin
option tcp-check
server k8s-node1 192.168.10.11:6443 check fall 3 rise 2
server k8s-node2 192.168.10.12:6443 check fall 3 rise 2
server k8s-node3 192.168.10.13:6443 check fall 3 rise 2
frontend stats
bind *:9000
mode http
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
frontend prometheus
bind *:8405
mode http
http-request use-service prometheus-exporter if { path /metrics }
no log
HAPCFG
setsebool -P haproxy_connect_any 1 2>/dev/null || true
systemctl enable --now haproxy
# ---- NFS Server ----
dnf install -y nfs-utils
mkdir -p /srv/nfs/share
cat >> /etc/exports <<EOF
/srv/nfs/share *(rw,async,no_root_squash)
EOF
systemctl enable --now nfs-server
exportfs -arv
# ---- kubectl ----
cat > /etc/yum.repos.d/kubernetes.repo <<'REPO'
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
REPO
dnf install -y kubectl
kubectl completion bash > /etc/bash_completion.d/kubectl
# ---- k9s ----
curl -Lo /tmp/k9s.rpm https://github.com/derailed/k9s/releases/latest/download/k9s_linux_amd64.rpm
rpm -ivh /tmp/k9s.rpm
rm -f /tmp/k9s.rpm
# ---- Helm ----
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | \
DESIRED_VERSION=v3.16.2 bash
helm completion bash > /etc/bash_completion.d/helm
# ---- SSH Key Distribution ----
dnf install -y sshpass
ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa -q
for i in 1 2 3 4 5; do
sshpass -p 'qwe123' ssh-copy-id -o StrictHostKeyChecking=no root@192.168.10.1${i}
done
# ---- Kubespray ----
dnf install -y git python3-pip
git clone -b v2.29.1 https://github.com/kubernetes-sigs/kubespray.git /root/kubespray
pip3 install --break-system-packages -r /root/kubespray/requirements.txt
echo "==============================="
echo " admin-lb bootstrap complete"
echo "==============================="
Node Init Script (init_cfg.sh)
#!/usr/bin/env bash
set -euo pipefail
echo "==============================="
echo " K8s node init: $(hostname)"
echo "==============================="
# ---- Timezone ----
timedatectl set-timezone Asia/Seoul
# ---- Disable Swap ----
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
# ---- Disable Firewall & SELinux ----
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
# ---- Kernel Modules for Kubernetes ----
cat > /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# ---- Sysctl Settings ----
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
# ---- Local DNS (/etc/hosts) ----
cat >> /etc/hosts <<EOF
192.168.10.10 admin-lb
192.168.10.11 k8s-node1
192.168.10.12 k8s-node2
192.168.10.13 k8s-node3
192.168.10.14 k8s-node4
192.168.10.15 k8s-node5
EOF
# ---- SSH Configuration ----
sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config
sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
echo 'root:qwe123' | chpasswd
systemctl restart sshd
echo "==============================="
echo " K8s node init complete: $(hostname)"
echo "==============================="
Deploying the Environment
1. Start All VMs
vagrant up
This takes 15-30 minutes. K8s nodes provision first (init_cfg.sh), then admin-lb provisions last (admin-lb.sh including SSH key distribution).
2. Verify VM Status
vagrant status
# Expected: all 6 VMs running
3. Connect to Admin Node
vagrant ssh admin-lb
4. Test Connectivity to All Nodes
for i in 1 2 3 4 5; do
ssh root@192.168.10.1${i} hostname
done
# Should print k8s-node1 through k8s-node5
5. Verify HAProxy
# Check service status
systemctl status haproxy
# Stats page (from host browser)
# http://192.168.10.10:9000/stats
# Prometheus metrics
curl http://192.168.10.10:8405/metrics
# API frontend will show DOWN until K8s is deployed (expected)
6. Verify NFS
showmount -e localhost
# Expected: /srv/nfs/share *
Rocky Linux 10 Considerations
Python 3.12 and PEP 668
Rocky Linux 10 ships Python 3.12, which enforces PEP 668 (externally managed environments). Direct pip install fails:
error: externally-managed-environment
Fix: Use the --break-system-packages flag:
pip3 install --break-system-packages -r requirements.txt
This is already included in the admin-lb.sh script. Alternatively, use a virtual environment:
python3 -m venv /root/kubespray-venv
source /root/kubespray-venv/bin/activate
pip install -r /root/kubespray/requirements.txt
Common Errors (Searchable)
VBoxManage: error: Could not find a controller named 'SATA Controller'
Cause: VirtualBox storage controller mismatch with box image. Fix: Destroy the VM (vagrant destroy <name>) and re-create. If persistent, update VirtualBox.
The IP address configured for the host-only network is not within the allowed ranges
Cause: VirtualBox 6.1.28+ restricts host-only network ranges. Fix: Create or edit /etc/vbox/networks.conf:
* 192.168.10.0/24
Timed out while waiting for the machine to boot
Cause: Insufficient host resources or VirtualBox issue. Fix: Reduce VM count or memory, check VBoxManage list runningvms.
sshpass: ssh-copy-id failed (exit code 6)
Cause: Target node SSH not ready when admin-lb provisions. Fix: Re-run provisioning: vagrant provision admin-lb. The Vagrantfile defines K8s nodes first to minimize this.
error: externally-managed-environment
Cause: Python 3.12 PEP 668 on Rocky Linux 10. Fix: Add --break-system-packages to pip install.
E: Failed to connect to HAProxy stats socket
Cause: HAProxy not running or misconfigured. Fix: Check systemctl status haproxy and validate /etc/haproxy/haproxy.cfg syntax with haproxy -c -f /etc/haproxy/haproxy.cfg.
mount.nfs: access denied by server
Cause: NFS exports not applied. Fix: Run exportfs -arv and verify with showmount -e localhost.
Common Mistakes
| Mistake | Consequence |
|---|---|
Forgetting --nicpromisc2 allow-all |
CNI (Calico/Flannel) traffic dropped between VMs |
Not setting ip= in Kubespray inventory |
Kubespray detects 10.0.2.15 (NAT) instead of 192.168.10.x |
| Starting admin-lb before K8s nodes | SSH key distribution fails because nodes are not ready |
Missing /etc/vbox/networks.conf |
VirtualBox rejects 192.168.10.0/24 host-only range |
Skipping swapoff -a |
kubelet refuses to start with swap enabled |
| Not disabling SELinux | Container runtime and CNI operations fail |
Using pip install without --break-system-packages |
PEP 668 blocks installation on Rocky Linux 10 |
Forgetting linked_clone = true |
Each VM copies the full base image (2-4 GB each instead of ~200 MB) |
| Wrong Kubespray branch vs K8s version | Version mismatch causes deployment failure |
More from sigridjineth/kubespray-skills
rke2-operations
Use when managing RKE2 cluster certificates, performing manual or automated version upgrades, rotating TLS certificates, deploying the System Upgrade Controller, or troubleshooting RKE2 certificate and upgrade errors. Use when seeing "x509 certificate has expired" or "CertificateExpirationWarning" events or "Job has reached the specified backoff limit" errors.
3rke2-deployment
Use when deploying Kubernetes clusters with RKE2 (Rancher Kubernetes Engine 2), configuring server and agent nodes, managing built-in Helm chart addons, or setting up CIS-hardened clusters. Use when seeing "rke2-server failed to start" or "unable to join cluster" errors.
3kubeadm-troubleshooting
Use when kubeadm init fails, join fails, nodes show NotReady, pods stuck Pending, certificate errors, or kubelet crashlooping
3kubeadm-init
Use when initializing a Kubernetes control plane with kubeadm, setting up certificates, static pods, or troubleshooting init failures
2cluster-api
Use when managing Kubernetes clusters as Kubernetes resources with Cluster API (CAPI), provisioning workload clusters from a management cluster, performing declarative upgrades, or working with ClusterClass blueprints. Use when seeing "failed to connect to management cluster" or clusterctl errors.
2kubespray-airgap
Use when deploying Kubernetes in air-gapped or offline environments using kubespray-offline tool, setting up private container registries, staging binaries and images for offline use, configuring containerd registry mirrors, or troubleshooting image pull failures in isolated networks.
2