gke-cluster-lifecycle
GKE Cluster Lifecycle and Upgrades
This skill provides guidance on managing the lifecycle and upgrades of Google Kubernetes Engine (GKE) clusters.
Overview
Managing cluster upgrades is crucial for security and access to new features. GKE provides automated upgrades, but they must be configured to minimize disruption.
Workflows
1. Select Release Channels
Release channels allow you to choose the balance between stability and feature availability.
- Rapid: Newest features, less tested.
- Regular (Default): Good balance.
- Stable: Most tested, best for critical production workloads.
Command to set release channel:
gcloud container clusters update <cluster-name> \
--release-channel=stable \
--region <region>
2. Configure Surge Upgrades
Surge upgrades allow you to specify how many nodes can be created above the target size during an upgrade, minimizing disruption.
Example configuration:
gcloud container node-pools update <pool-name> \
--cluster=<cluster-name> \
--max-surge-upgrade=2 \
--max-unavailable-upgrade=0 \
--region <region>
Setting max-unavailable-upgrade=0 ensures that no nodes are taken offline before new ones are ready.
3. Implement Blue/Green Node Pool Upgrades
For high-risk upgrades, you can create a new node pool (Green) with the new version, test it, and then migrate workloads from the old node pool (Blue).
Steps:
- Create a new node pool with the new version and appropriate taints (using --node-taints).
- Cordon and drain the old node pool gradually.
- Delete the old node pool once empty.
Best Practices
- Use Release Channels: Always enroll production clusters in a release channel (preferably
StableorRegular). - Configure Surge Upgrades: Use
max-surge-upgradeto ensure availability during upgrades. - Use Maintenance Windows: Configure maintenance windows to ensure upgrades only happen during off-peak hours (see gke-reliability).
- Test in Non-Prod: Always test upgrades in a staging environment before applying them to production.
More from googlecloudplatform/gke-mcp
gke-backup-dr
Workflows for configuring Backup for GKE and disaster recovery.
2gke-reliability
Workflows for ensuring high availability and reliability of GKE workloads.
2gke-storage
Guidance on managing storage in Google Kubernetes Engine (GKE) clusters.
2gke-app-onboarding
Workflows for containerizing and deploying applications to GKE for the first time.
2gke-workload-security
Workflows for auditing and hardening the security of GKE workloads.
2gke-cost-optimization
Guidance on optimizing costs for Google Kubernetes Engine (GKE) clusters.
2