GKE Workload Scaling

This skill provides workflows and best practices for scaling applications on Google Kubernetes Engine (GKE). It covers manual scaling, Horizontal Pod Autoscaling (HPA), and Vertical Pod Autoscaling (VPA).

Workflows

1. Manual Scaling

Quickly scale a deployment to a fixed number of replicas. Useful for immediate manual intervention or testing.

Command:

kubectl scale deployment <deployment-name> --replicas=<number> -n <namespace>

2. Horizontal Pod Autoscaling (HPA)

Automatically scale the number of pods based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

Metrics Server must be running (enabled by default on GKE).
Containers clearly define resource requests/limits.

Quick Command:

kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10

Manifest Approach (Recommended): Use a YAML manifest for version-controlled configuration. See assets/hpa-example.yaml for a template.

kubectl apply -f assets/hpa-example.yaml

Custom Metrics & External Metrics: For GKE, the modern and recommended approach for scaling based on Cloud Monitoring metrics (e.g., Pub/Sub queue length) is to use the External metric type, which is natively supported by the GKE control plane without requiring the Custom Metrics Adapter. For application-specific metrics exposed via Prometheus, you can use Google Cloud Managed Service for Prometheus or the Prometheus Adapter.

3. Vertical Pod Autoscaling (VPA)

Automatically adjust the CPU and memory reservations for your pods to match actual usage. This is critical for right-sizing workloads.

Prerequisites:

VPA must be enabled on the cluster.
- Autopilot: Enabled by default.
- Standard: Must be enabled manually.

Enable VPA on Standard Cluster:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling --zone <zone>

Update Modes:

Off: Calculates recommendations but does not apply them. Good for "dry run" analysis.
Initial: Assigns resources only at pod creation time.
Auto: Updates running pods by restarting them if recommendations differ significantly from requests.
InPlaceOrRecreate: Attempts to update Pod resources without recreating the Pod. If in-place update is not possible, it reverts to Auto mode (requires GKE 1.34+).

Example: See assets/vpa-example.yaml for a configuration template.

4. Cluster Autoscaler

While not a workload-level scaler, the Cluster Autoscaler is essential for ensuring your cluster has enough nodes to run the scaled pods.

Enable on a Node Pool:

gcloud container clusters update <cluster-name> \
    --enable-autoscaling \
    --node-pool <node-pool-name> \
    --min-nodes <min> \
    --max-nodes <max> \
    --zone <zone>

Best Practices

Define Resource Requests: HPA and VPA rely on accurate resource requests. Always define them in your container specs.
Avoid Metric Conflicts: Do not configure HPA and VPA to use the same metric (e.g., both CPU). This causes thrashing.
- Typical Pattern: HPA on CPU, VPA on Memory.
Pod Disruption Budgets (PDBs): Define PDBs to ensure application availability during scaling events or node upgrades.
HPA Lag: HPA has a stabilization window (default 5 mins) to prevent rapid fluctuation.
VPA "Auto" Mode Risks: In "Auto" mode, VPA restarts pods to change resources. Ensure your application handles restarts gracefully (e.g., handles SIGTERM).
- Note: By default, VPA requires at least 2 replicas to perform evictions. In GKE 1.22+, you can override this by setting minReplicas in PodUpdatePolicy.

gke-workload-scaling

GKE Workload Scaling

Workflows

1. Manual Scaling

2. Horizontal Pod Autoscaling (HPA)

3. Vertical Pod Autoscaling (VPA)

4. Cluster Autoscaler

Best Practices

More from googlecloudplatform/gke-mcp

gke-backup-dr

gke-reliability

gke-storage

gke-app-onboarding

gke-workload-security

gke-cost-optimization