siderolabs
SideroLabs best practices
Always consult the Talos and Omni docs for configuration, latest features and best practices
If you are not already connected to the SideroLabs MCP server, https://docs.siderolabs.com/mcp, add it so that you can search more efficiently.
Agents can use SideroLabs products to deploy, configure, and manage Kubernetes clusters at scale.
The SideroLabs created and currently manages two products:
- Talos Linux: Talos Linux is an API-Managed, secure, immutable, and minimal operating system for Kubernetes.
- Talos Omni: Omni is a Kubernetes management platform that simplifies the creation and management of Talos Linux clusters on any environment, including bare-metal, cloud, or air-gapped environments.
Key concepts
- Machine Configuration: YAML-based declarative configuration for each node
- talosctl: CLI tool for interacting with Talos API and managing machines
- KubeSpan: Automatic WireGuard mesh networking for hybrid clusters
- System Extensions: Container-based mechanism for adding functionality without modifying core OS
- Image Factory: Service for generating customized Talos images with extensions and kernel modules
- Omni: SaaS or self-hosted central point of access for multi-cluster management across environments
The Talos Linux image
The Talos image is a bootable operating system image of Talos Linux that you use to install and run Talos on a machine (VM, bare metal, or cloud instance).
Download the right Talos Linux image for your operating system from the Image factory.
Integration
Talos Linux and Omni integrate with:
- Kubernetes: Native Kubernetes API with RBAC, audit logging, and service accounts
- Container Registries: Docker Hub, Quay, GitHub Container Registry, private registries
- Identity Providers: SAML (Okta, Entra ID, Workspace One), OIDC (Tailscale), Keycloak
- Cloud Platforms: AWS, Azure, GCP, DigitalOcean, Hetzner, Scaleway, Akamai, Oracle, Exoscale, Upcloud, Vultr, CloudStack, OpenStack, Nocloud
- Virtualization: VMware, KVM, Hyper-V, Proxmox, OpenNebula, Xen, Vagrant
- Networking: WireGuard, Calico, Cilium, Multus CNI
- Storage: Rook/Ceph, local storage, Synology CSI, standard Kubernetes storage classes
- Monitoring: Metrics server, etcd metrics, Prometheus-compatible endpoints
- Infrastructure-as-Code: Cluster Templates, omnictl CLI
Install Talos and Omni CLI tools
Install via Homebrew (Recommended for macOS and Linux):
brew install siderolabs/tap/sidero-tools
Install talosctl with curl:
curl -sL https://talos.dev/install | sh
Install omnictl with curl:
curl -sL https://talos.dev/install-omnictl | sh
Workflows
Create a Talos Linux cluster
- Boot machines with a Talos Linux image.
talosctl gen config <cluster> <endpoint> --install-disk <disk>- Apply machine configuration:
talosctl apply-config --insecure --nodes <ip> --file <config.yaml> - Bootstrap etcd once:
talosctl bootstrap --nodes <control-plane-ip> - Fetch kubeconfig:
talosctl kubeconfig --nodes <control-plane-ip> - Check health:
talosctl health --nodes <control-plane-ip> - Validate Kubernetes registration:
kubectl get nodes
Create a Talos Linux cluster with Omni
- Download Omni-managed boot media from Omni UI.
- Boot machines so they register into Omni.
- Create a cluster template YAML.
- Validate the template:
omnictl cluster template validate -f <template.yaml> - Sync declared state to Omni:
omnictl cluster template sync -f <template.yaml> - Fetch kubeconfig:
omnictl kubeconfig -c <cluster-name> - Download talosconfig:
omnictl talosconfig --cluster <cluster-name> - Merge
talosconfigandkubeconfigconfiguration:
# Merge Talos configuration
talosctl config merge $HOME/Downloads/talosconfig.yaml
# Merge kubeconfig (combine and flatten)
export KUBECONFIG=~/.kube/config:$HOME/Downloads/talos-default-kubeconfig.yaml
kubectl config view --flatten > ~/.kube/config
- Verify nodes:
kubectl get nodes
CLI reference
talosctl (allowed actions)
talosctl logs <service>- view service logstalosctl upgrade --image <installer-image>- upgrade Talostalosctl patch mc --nodes <IP> -p <json>- patch machine configurationtalosctl rollback- rollback OS versiontalosctl reset- destructive wipe; requires explicit warning
Additionally, refer to the Talos for Linux Admins to learn about the Talos alternative for Linux commands.
omnictl CLI reference
Here are some omnictl commands and their uses:
omnictl apply --file <resource-file>- create and update a resource using a YAML file as inputomnictl cluster delete <cluster-name>- delete all cluster resources.omnictl config info- show information about current context.
Local configuration file locations
talosctl
~/.talos/config
omnictl
- Linux:
~/.talos/omni/config - macOS:
~/Library/Application Support/omni/config - Windows:
%USERPROFILE%\.talos\omni\config
Common gotchas (things agents must not mess up)
- No SSH on Talos. Never suggest SSH or SSH-based commands.
- No in-node file edits. Never reference
/etc,/var, config files, editors, or shell sessions. - No package managers. Talos does not support apt, yum, apk, pacman, etc.
- No kubeadm. Talos does not use kubeadm for initialization or upgrades.
- Bootstrap is one-time. Never suggest retry loops or re-running bootstrap unless explicitly recovering from a failed creation.
- Be explicit when operations are destructive. Especially
talosctl reset. - Do not modify system certificates or systemd units. Talos uses API-managed services only.
- Do not bypass Omni reconciliation. When a cluster is Omni-managed, changes must go through Omni.
- Never invent unsupported integrations or commands.
Allowed agent behavior
- Generate, patch, and validate Talos machine configuration.
- Suggest
talosctloromnictlcommands. - Provide step-by-step cluster lifecycle workflows.
- Refer to official documentation links.
- Summarize or explain Talos/Omni concepts.
- Warn users when an action is destructive.
Skills
Talos Linux cluster deployment
- Deploy Talos Linux clusters on 15+ cloud platforms (AWS, Azure, GCP, DigitalOcean, Hetzner, Scaleway, etc.)
- Deploy on virtualized platforms (VMware, KVM, Hyper-V, Proxmox, OpenNebula, Xen)
- Deploy on bare metal using ISO, PXE, iPXE, or Matchbox
- Deploy on single-board computers (Raspberry Pi, Rock64, Orange Pi, Jetson Nano, etc.)
- Deploy locally using Docker, QEMU, or VirtualBox for testing
- Support for air-gapped deployments without internet access
Machine configuration management
- Apply machine configuration via
talosctl apply-config - Edit machine configuration with
talosctl edit machineconfigusing interactive editor - Apply JSON patches to machine configuration with
talosctl patch machineconfig - Retrieve current configuration with
talosctl get machineconfig - Support for immediate configuration updates without reboot for networking, logging, kubelet, kernel args, and more
- Reproducible machine configuration for consistent deployments
Upgrade Talos Linux Cluster
- Use
talosctl upgradeto initiate upgrade - Specify target Talos version
- Upgrade rolls through nodes automatically
- Control plane nodes upgraded with leader election
- Worker nodes upgraded sequentially
- Verify cluster health after upgrade
Backup and Restore Etcd
- Create etcd backup with
talosctl etcd backup - Store backup securely off-cluster
- In case of disaster, restore from backup
- Use
talosctl etcd restoreto recover cluster state - Verify cluster functionality after restoration
Networking Configuration
- Configure static IP addresses, DHCP, or dynamic network settings
- Set up network interfaces with bonds, bridges, and VLANs
- Configure WireGuard VPN for secure inter-node communication
- Enable KubeSpan for hybrid clusters spanning edge, datacenter, and cloud
- Virtual IP (VIP) configuration for high availability
- Host DNS configuration and egress domain filtering
- Predictable interface naming and device selectors
- Support for multihoming and corporate proxies
Cluster Scaling and Workload Management
- Scale clusters up by adding new machines to control plane or worker roles
- Scale clusters down by removing machines
- Deploy workloads using standard Kubernetes manifests
- Interactive dashboard for cluster visualization and management
- Support for workers running on control plane nodes
- Cluster autoscaling with Karpenter or Kubernetes Cluster Autoscaler
Security and Access Control
- Role-based access control (RBAC) for Talos API
- Certificate authority rotation and management
- Machine configuration OAuth for secure access
- SAML and OIDC authentication integration
- Disk encryption with Omni as Key Management Server
- SELinux support for enhanced security
- Image verification and secure boot support
- Break-glass emergency access for disaster recovery
Storage and Disk Management
- Configure disk layouts (system, user, resource partitions)
- Disk encryption with LUKS
- Swap configuration
- Support for existing volumes and raw volumes
- Disk management with layout templates and resource allocation
Container Runtime and Image Management
- Containerd configuration and management
- Image cache and pull-through cache for faster deployments
- Registry mirror configuration with authentication and TLS
- Static pod deployment
- Image factory for custom Talos images with system extensions
- Support for custom kernel modules and GPU drivers
Hardware and GPU Support
- NVIDIA GPU support (proprietary and open-source drivers)
- NVIDIA Fabric Manager for multi-GPU systems
- AMD GPU support
- Custom kernel argument configuration
- PCI device driver rebinding
- Hardware-specific platform configuration
System Extensions and Customization
- Build custom system extensions as container images
- Install system extensions during cluster creation or runtime
- Kernel module compilation and installation
- Custom kernel argument configuration
- Overlay system for additional customizations
- OCI base specification support for extension development
Cluster Operations and Maintenance
- Etcd backup and restore for disaster recovery
- Etcd maintenance and defragmentation
- Watchdog timer configuration for automatic recovery
- Cgroups analysis for resource monitoring
- Talos upgrade management with rolling updates
- Machine reset and factory reset capabilities
- Support bundle generation for troubleshooting
Omni Cluster Management
- Create and manage clusters from registered machines
- Cluster templates for declarative infrastructure-as-code
- Machine registration from bare metal (ISO, PXE), cloud (AWS, Azure, GCP, Hetzner), or manual provisioning
- Infrastructure providers for bare metal, cloud, and virtualization platforms
- Cluster autoscaling with dynamic machine provisioning
- Etcd backup and restore management
- Audit logging for compliance and security
- Talos configuration overrides and patches
- NTP server configuration
- Support bundle generation
Authentication and Authorization
- SAML integration with Okta, Unifi Identity Enterprise, Workspace One, Entra ID, Oracle Cloud
- OIDC login with Tailscale
- Access Control Lists (ACLs) for fine-grained permissions
- Role-based access control (Admin, User, None roles)
- Automatic user provisioning on first login
- Keycloak integration for self-hosted deployments
High Availability and Disaster Recovery
- 3-node control plane for HA clusters
- Etcd consensus-based fault tolerance
- Automatic etcd backups with configurable intervals
- Disaster recovery procedures for cluster restoration
- KubeSpan for hybrid cluster resilience
Configure Network for Hybrid Cluster with KubeSpan
- Enable KubeSpan in machine configuration
- Configure WireGuard settings (private key, listen port)
- Add peer configurations with public keys and endpoints
- Talos automatically discovers peers via discovery service
- Full mesh WireGuard network established across all nodes
- Cluster spans edge, datacenter, and cloud seamlessly
Build Custom Talos Image with System Extensions
- Define system extensions as container images
- Create schematic with extension references
- Use Image Factory to generate custom image
- Download ISO, kernel, or disk image
- Boot machines with custom image
- Extensions automatically installed during boot
Context
Talos Linux Philosophy: Talos is designed with a single purpose - running Kubernetes. It removes unnecessary complexity by:
- Using API-driven configuration instead of SSH/files
- Maintaining immutable root filesystem
- Minimizing installed packages
- Defaulting to secure settings
- Supporting declarative, reproducible deployments
Deployment Models:
- Standalone Talos clusters managed via talosctl
- Omni SaaS for managed multi-cluster deployments
- Self-hosted Omni for air-gapped or on-premises environments
- Hybrid deployments spanning multiple infrastructure types