kubespray-lab-setup

SKILL.md

Kubespray Lab Environment Setup

Overview

This skill covers building a local 6-node Vagrant/VirtualBox lab for testing Kubespray HA deployments. The environment includes an admin/load balancer node, three control plane nodes with stacked etcd, and two worker nodes running Rocky Linux 10.

Core principle: The admin-lb node acts as your deployment workstation, HAProxy load balancer, and NFS server. All Kubespray operations are executed from this node against the five K8s nodes over a host-only network.

When to Use

  • Setting up a local Vagrant/VirtualBox lab for Kubespray testing
  • Provisioning multi-node Rocky Linux clusters
  • Configuring an admin node with HAProxy, NFS, and deployment tools
  • Preparing K8s nodes with kernel modules and sysctl settings
  • Testing HA configurations with a local load balancer

Not for: Running Kubespray playbooks (use kubespray-deployment), configuring HA variables (use kubespray-ha-configuration), troubleshooting cluster issues (use kubespray-troubleshooting)

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                    Host Machine (macOS/Linux)                     │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐    │
│  │           VirtualBox Host-Only Network 192.168.10.0/24   │    │
│  │                                                          │    │
│  │  ┌──────────────┐                                        │    │
│  │  │  admin-lb     │  192.168.10.10                        │    │
│  │  │  HAProxy      │  API LB :6443 → CP nodes             │    │
│  │  │  NFS Server   │  Stats :9000  Prometheus :8405        │    │
│  │  │  Kubespray    │  NFS /srv/nfs/share                   │    │
│  │  └──────────────┘                                        │    │
│  │                                                          │    │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐         │    │
│  │  │ k8s-node1  │  │ k8s-node2  │  │ k8s-node3  │         │    │
│  │  │ CP + etcd  │  │ CP + etcd  │  │ CP + etcd  │         │    │
│  │  │ .10.11     │  │ .10.12     │  │ .10.13     │         │    │
│  │  └────────────┘  └────────────┘  └────────────┘         │    │
│  │                                                          │    │
│  │  ┌────────────┐  ┌────────────┐                          │    │
│  │  │ k8s-node4  │  │ k8s-node5  │                          │    │
│  │  │ Worker     │  │ Worker     │                          │    │
│  │  │ .10.14     │  │ .10.15     │                          │    │
│  │  └────────────┘  └────────────┘                          │    │
│  │                                                          │    │
│  └──────────────────────────────────────────────────────────┘    │
│                                                                  │
│  Each VM: enp0s3 (NAT, internet) + enp0s8/enp0s9 (Host-Only)   │
└──────────────────────────────────────────────────────────────────┘

VM Specifications

Node IP Role CPU Memory
admin-lb 192.168.10.10 HAProxy, NFS, Kubespray deployer 2 1024 MB
k8s-node1 192.168.10.11 Control Plane + etcd 4 2048 MB
k8s-node2 192.168.10.12 Control Plane + etcd 4 2048 MB
k8s-node3 192.168.10.13 Control Plane + etcd 4 2048 MB
k8s-node4 192.168.10.14 Worker 4 2048 MB
k8s-node5 192.168.10.15 Worker 4 2048 MB

Network interfaces:

  • enp0s3 - NAT adapter for internet access (VirtualBox default)
  • enp0s8 / enp0s9 - Host-Only adapter for cluster communication (192.168.10.0/24)

Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :

BOX_IMAGE = "bento/rockylinux-10"
N = 5

Vagrant.configure("2") do |config|

  # --- k8s-node1 through k8s-node5 ---
  (1..N).each do |i|
    config.vm.define "k8s-node#{i}" do |node|
      node.vm.box = BOX_IMAGE
      node.vm.hostname = "k8s-node#{i}"
      node.vm.network "private_network", ip: "192.168.10.#{10 + i}"

      node.vm.provider "virtualbox" do |vb|
        vb.name = "k8s-node#{i}"
        vb.memory = 2048
        vb.cpus = 4
        vb.linked_clone = true
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
      end

      node.vm.provision "shell", path: "init_cfg.sh"
    end
  end

  # --- admin-lb node ---
  config.vm.define "admin-lb" do |admin|
    admin.vm.box = BOX_IMAGE
    admin.vm.hostname = "admin-lb"
    admin.vm.network "private_network", ip: "192.168.10.10"

    admin.vm.provider "virtualbox" do |vb|
      vb.name = "admin-lb"
      vb.memory = 1024
      vb.cpus = 2
      vb.linked_clone = true
      vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
    end

    admin.vm.provision "shell", path: "admin-lb.sh"
  end
end

Key settings:

  • linked_clone = true - saves disk space by sharing base image
  • nicpromisc2 allow-all - enables promiscuous mode on the second NIC for CNI traffic (Calico/Flannel)
  • K8s nodes are provisioned first so they are ready when admin-lb runs SSH key distribution

Admin-LB Bootstrap Script (admin-lb.sh)

#!/usr/bin/env bash
set -euo pipefail

echo "==============================="
echo " admin-lb bootstrap"
echo "==============================="

# ---- Timezone ----
timedatectl set-timezone Asia/Seoul

# ---- Disable Firewall & SELinux ----
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# ---- SSH Configuration ----
sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config
sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
echo 'root:qwe123' | chpasswd
systemctl restart sshd

# ---- Local DNS (/etc/hosts) ----
cat >> /etc/hosts <<EOF
192.168.10.10 admin-lb
192.168.10.11 k8s-node1
192.168.10.12 k8s-node2
192.168.10.13 k8s-node3
192.168.10.14 k8s-node4
192.168.10.15 k8s-node5
EOF

# ---- HAProxy ----
dnf install -y haproxy

cat > /etc/haproxy/haproxy.cfg <<'HAPCFG'
global
    log         127.0.0.1 local2
    maxconn     4096
    daemon

defaults
    mode    tcp
    log     global
    option  tcplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s

frontend k8s-api
    bind *:6443
    default_backend k8s-api-backend

backend k8s-api-backend
    balance roundrobin
    option tcp-check
    server k8s-node1 192.168.10.11:6443 check fall 3 rise 2
    server k8s-node2 192.168.10.12:6443 check fall 3 rise 2
    server k8s-node3 192.168.10.13:6443 check fall 3 rise 2

frontend stats
    bind *:9000
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

frontend prometheus
    bind *:8405
    mode http
    http-request use-service prometheus-exporter if { path /metrics }
    no log
HAPCFG

setsebool -P haproxy_connect_any 1 2>/dev/null || true
systemctl enable --now haproxy

# ---- NFS Server ----
dnf install -y nfs-utils
mkdir -p /srv/nfs/share
cat >> /etc/exports <<EOF
/srv/nfs/share *(rw,async,no_root_squash)
EOF
systemctl enable --now nfs-server
exportfs -arv

# ---- kubectl ----
cat > /etc/yum.repos.d/kubernetes.repo <<'REPO'
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
REPO
dnf install -y kubectl
kubectl completion bash > /etc/bash_completion.d/kubectl

# ---- k9s ----
curl -Lo /tmp/k9s.rpm https://github.com/derailed/k9s/releases/latest/download/k9s_linux_amd64.rpm
rpm -ivh /tmp/k9s.rpm
rm -f /tmp/k9s.rpm

# ---- Helm ----
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | \
  DESIRED_VERSION=v3.16.2 bash
helm completion bash > /etc/bash_completion.d/helm

# ---- SSH Key Distribution ----
dnf install -y sshpass
ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa -q
for i in 1 2 3 4 5; do
  sshpass -p 'qwe123' ssh-copy-id -o StrictHostKeyChecking=no root@192.168.10.1${i}
done

# ---- Kubespray ----
dnf install -y git python3-pip
git clone -b v2.29.1 https://github.com/kubernetes-sigs/kubespray.git /root/kubespray
pip3 install --break-system-packages -r /root/kubespray/requirements.txt

echo "==============================="
echo " admin-lb bootstrap complete"
echo "==============================="

Node Init Script (init_cfg.sh)

#!/usr/bin/env bash
set -euo pipefail

echo "==============================="
echo " K8s node init: $(hostname)"
echo "==============================="

# ---- Timezone ----
timedatectl set-timezone Asia/Seoul

# ---- Disable Swap ----
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

# ---- Disable Firewall & SELinux ----
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# ---- Kernel Modules for Kubernetes ----
cat > /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

# ---- Sysctl Settings ----
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

# ---- Local DNS (/etc/hosts) ----
cat >> /etc/hosts <<EOF
192.168.10.10 admin-lb
192.168.10.11 k8s-node1
192.168.10.12 k8s-node2
192.168.10.13 k8s-node3
192.168.10.14 k8s-node4
192.168.10.15 k8s-node5
EOF

# ---- SSH Configuration ----
sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config
sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
echo 'root:qwe123' | chpasswd
systemctl restart sshd

echo "==============================="
echo " K8s node init complete: $(hostname)"
echo "==============================="

Deploying the Environment

1. Start All VMs

vagrant up

This takes 15-30 minutes. K8s nodes provision first (init_cfg.sh), then admin-lb provisions last (admin-lb.sh including SSH key distribution).

2. Verify VM Status

vagrant status
# Expected: all 6 VMs running

3. Connect to Admin Node

vagrant ssh admin-lb

4. Test Connectivity to All Nodes

for i in 1 2 3 4 5; do
  ssh root@192.168.10.1${i} hostname
done
# Should print k8s-node1 through k8s-node5

5. Verify HAProxy

# Check service status
systemctl status haproxy

# Stats page (from host browser)
# http://192.168.10.10:9000/stats

# Prometheus metrics
curl http://192.168.10.10:8405/metrics

# API frontend will show DOWN until K8s is deployed (expected)

6. Verify NFS

showmount -e localhost
# Expected: /srv/nfs/share *

Rocky Linux 10 Considerations

Python 3.12 and PEP 668

Rocky Linux 10 ships Python 3.12, which enforces PEP 668 (externally managed environments). Direct pip install fails:

error: externally-managed-environment

Fix: Use the --break-system-packages flag:

pip3 install --break-system-packages -r requirements.txt

This is already included in the admin-lb.sh script. Alternatively, use a virtual environment:

python3 -m venv /root/kubespray-venv
source /root/kubespray-venv/bin/activate
pip install -r /root/kubespray/requirements.txt

Common Errors (Searchable)

VBoxManage: error: Could not find a controller named 'SATA Controller'

Cause: VirtualBox storage controller mismatch with box image. Fix: Destroy the VM (vagrant destroy <name>) and re-create. If persistent, update VirtualBox.

The IP address configured for the host-only network is not within the allowed ranges

Cause: VirtualBox 6.1.28+ restricts host-only network ranges. Fix: Create or edit /etc/vbox/networks.conf:

* 192.168.10.0/24
Timed out while waiting for the machine to boot

Cause: Insufficient host resources or VirtualBox issue. Fix: Reduce VM count or memory, check VBoxManage list runningvms.

sshpass: ssh-copy-id failed (exit code 6)

Cause: Target node SSH not ready when admin-lb provisions. Fix: Re-run provisioning: vagrant provision admin-lb. The Vagrantfile defines K8s nodes first to minimize this.

error: externally-managed-environment

Cause: Python 3.12 PEP 668 on Rocky Linux 10. Fix: Add --break-system-packages to pip install.

E: Failed to connect to HAProxy stats socket

Cause: HAProxy not running or misconfigured. Fix: Check systemctl status haproxy and validate /etc/haproxy/haproxy.cfg syntax with haproxy -c -f /etc/haproxy/haproxy.cfg.

mount.nfs: access denied by server

Cause: NFS exports not applied. Fix: Run exportfs -arv and verify with showmount -e localhost.

Common Mistakes

Mistake Consequence
Forgetting --nicpromisc2 allow-all CNI (Calico/Flannel) traffic dropped between VMs
Not setting ip= in Kubespray inventory Kubespray detects 10.0.2.15 (NAT) instead of 192.168.10.x
Starting admin-lb before K8s nodes SSH key distribution fails because nodes are not ready
Missing /etc/vbox/networks.conf VirtualBox rejects 192.168.10.0/24 host-only range
Skipping swapoff -a kubelet refuses to start with swap enabled
Not disabling SELinux Container runtime and CNI operations fail
Using pip install without --break-system-packages PEP 668 blocks installation on Rocky Linux 10
Forgetting linked_clone = true Each VM copies the full base image (2-4 GB each instead of ~200 MB)
Wrong Kubespray branch vs K8s version Version mismatch causes deployment failure
Weekly Installs
2
GitHub Stars
3
First Seen
Feb 28, 2026
Installed on
opencode2
gemini-cli2
antigravity2
claude-code2
windsurf2
github-copilot2