together-gpu-clusters

Installation
SKILL.md

Together GPU Clusters

Overview

Use Together AI GPU clusters when the user needs infrastructure control instead of a managed inference product.

Typical fits:

  • distributed training
  • multi-node inference
  • HPC or Slurm workloads
  • custom Kubernetes jobs
  • attached shared storage and cluster lifecycle management

When This Skill Wins

  • Provision a cluster and manage it over time
  • Choose between on-demand and reserved capacity
  • Choose Kubernetes or Slurm as the orchestration layer
  • Manage shared volumes and credentials
  • Scale up, scale down, or troubleshoot node health

Hand Off To Another Skill

  • Use together-dedicated-endpoints for managed single-model hosting
  • Use together-dedicated-containers for containerized inference without owning the full cluster
  • Use together-code-interpreter for short-lived remote Python execution
  • Use together-fine-tuning for managed training jobs instead of raw cluster operations

Quick Routing

Workflow

  1. Decide whether the workload really needs cluster-level control.
  2. Choose on-demand vs reserved billing based on run duration and baseline utilization.
  3. Choose Kubernetes vs Slurm based on orchestration requirements and team tooling.
  4. Select region, GPU type, driver version, and shared storage plan.
  5. Provision first, then layer in access credentials, workload deployment, scaling, and health checks.

High-Signal Rules

  • Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
  • Prefer managed products unless the user explicitly needs raw infrastructure control.
  • Treat storage lifecycle separately from cluster lifecycle; volumes can outlive clusters.
  • When creating a cluster with new shared storage, prefer inline shared_volume over creating a volume separately and attaching via volume_id. Separately created volumes may land in a different datacenter partition than the cluster, causing a "does not exist in the datacenter" error even when the volume shows as available.
  • GPU stock-outs (409 "Out of stock") are common. Always call list_regions() first and be prepared to try multiple regions.
  • The API requires cuda_version and nvidia_driver_version as separate fields in addition to the combined driver_version string. Pass them via extra_body in the Python SDK.
  • Credentials retrieval is part of provisioning. Do not stop at cluster creation if the user needs to run workloads immediately.
  • Slurm and Kubernetes operational patterns differ materially; read the cluster-management reference before improvising.
  • For repeated cluster operations, start from the scripts instead of rebuilding request shapes.

Resource Map

Official Docs

Related skills

More from zainhas/skills

Installs
1
Repository
zainhas/skills
First Seen
Mar 30, 2026