gpu-kubernetes-operations

SKILL.md

GPU Kubernetes Operations

Run resilient and cost-efficient GPU clusters for production AI workloads.

Key Capabilities

  • NVIDIA device plugin and GPU operator lifecycle
  • MIG partitioning for multi-workload efficiency
  • GPU-aware autoscaling (KEDA/cluster autoscaler)
  • Node health checks and proactive remediation

Cluster Baseline

  • Dedicated GPU node pools with taints and tolerations
  • Runtime class and driver/toolkit compatibility checks
  • Local SSD or high-throughput network storage for model weights
  • DCGM metrics exported to Prometheus

Scheduling Patterns

  • Use node affinity by GPU type (A10/L4/A100/H100).
  • Separate latency-critical inference from batch training.
  • Pin model replicas with anti-affinity for availability.
  • Reserve headroom for failover and rolling updates.

Autoscaling Strategy

  • Scale on queue depth + GPU utilization, not CPU alone.
  • Warm spare replicas for large model cold-start mitigation.
  • Cap burst scaling to avoid quota exhaustion.

Reliability Checks

  • ECC error and Xid monitoring
  • GPU memory pressure alerts
  • Driver mismatch detection during upgrades
  • Pod preemption impact analysis

Cost Optimization

  • Prefer MIG slices for smaller inference services.
  • Schedule batch jobs in off-peak windows.
  • Route low-priority traffic to cheaper model tiers.

Related Skills

Weekly Installs
3
GitHub Stars
13
First Seen
7 days ago
Installed on
opencode3
antigravity3
claude-code3
github-copilot3
codex3
zencoder3