together-gpu-clusters
Together GPU Clusters
Overview
Use Together AI GPU clusters when the user needs infrastructure control instead of a managed inference product.
Typical fits:
- distributed training
- multi-node inference
- HPC or Slurm workloads
- custom Kubernetes jobs
- attached shared storage and cluster lifecycle management
When This Skill Wins
- Provision a cluster and manage it over time
- Choose between on-demand and reserved capacity
- Choose Kubernetes or Slurm as the orchestration layer
- Manage shared volumes and credentials
- Scale up, scale down, or troubleshoot node health
Hand Off To Another Skill
- Use
together-dedicated-endpointsfor managed single-model hosting - Use
together-dedicated-containersfor containerized inference without owning the full cluster - Use
together-sandboxesfor short-lived remote Python execution - Use
together-fine-tuningfor managed training jobs instead of raw cluster operations
Quick Routing
- Cluster creation, scaling, credentials, deletion
- Start with scripts/manage_cluster.py or scripts/manage_cluster.ts
- Read references/api-reference.md
- Shared storage lifecycle
- Kubernetes vs Slurm operations
- Troubleshooting node health, PVCs, or scheduling
- tcloud CLI workflows
Workflow
- Decide whether the workload really needs cluster-level control.
- Choose on-demand vs reserved billing based on run duration and baseline utilization.
- Choose Kubernetes vs Slurm based on orchestration requirements and team tooling.
- Select region, GPU type, driver version, and shared storage plan.
- Provision first, then layer in access credentials, workload deployment, scaling, and health checks.
High-Signal Rules
- Python scripts require the Together v2 SDK (
together>=2.0.0). If the user is on an older version, they must upgrade first:uv pip install --upgrade "together>=2.0.0". - Prefer managed products unless the user explicitly needs raw infrastructure control.
- Treat storage lifecycle separately from cluster lifecycle; volumes can outlive clusters.
- When creating a cluster with new shared storage, prefer inline
shared_volumeover creating a volume separately and attaching viavolume_id. Separately created volumes may land in a different datacenter partition than the cluster, causing a "does not exist in the datacenter" error even when the volume shows as available. - GPU stock-outs (409 "Out of stock") are common. Always call
list_regions()first and be prepared to try multiple regions. - The API requires
cuda_versionandnvidia_driver_versionas separate fields in addition to the combineddriver_versionstring. Pass them viaextra_bodyin the Python SDK. - Credentials retrieval is part of provisioning. Do not stop at cluster creation if the user needs to run workloads immediately.
- Slurm and Kubernetes operational patterns differ materially; read the cluster-management reference before improvising.
- For repeated cluster operations, start from the scripts instead of rebuilding request shapes.
Resource Map
- Cluster API reference: references/api-reference.md
- Operational guide: references/cluster-management.md
- Operational troubleshooting: references/cluster-management.md
- CLI guide: references/tcloud-cli.md
- Python cluster management: scripts/manage_cluster.py
- TypeScript cluster management: scripts/manage_cluster.ts
- Python storage management: scripts/manage_storage.py
Official Docs
More from zainhas/togetherai-skills
together-code-interpreter
Use this skill for Together AI Code Interpreter workflows: remote Python execution, session reuse, file uploads, data analysis, plots, and stateful notebook-like runs through the TCI API. Reach for it whenever the user wants managed remote Python execution on Together AI instead of local execution, raw clusters, or full model hosting.
33together-audio
Text-to-speech and speech-to-text via Together AI, including REST, streaming, and realtime WebSocket TTS, plus transcription, translation, diarization, timestamps, and live STT. Reach for it whenever the user needs audio in or audio out on Together AI rather than chat generation, image or video creation, or model training.
14together-images
Text-to-image generation and image editing via Together AI, including FLUX and Kontext models, LoRA-based styling, reference-image guidance, and local image downloads. Reach for it whenever the user wants to generate or edit images on Together AI rather than create videos or build text-only chat applications.
14together-chat-completions
Real-time and streaming text generation via Together AI's OpenAI-compatible chat/completions API, including multi-turn conversations, tool and function calling, structured JSON outputs, and reasoning models. Reach for it whenever the user wants to build or debug text generation on Together AI, unless they specifically need batch jobs, embeddings, fine-tuning, dedicated endpoints, dedicated containers, or GPU clusters.
13together-dedicated-endpoints
Single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Deploy fine-tuned or uploaded models, size hardware, and manage endpoint lifecycle. Reach for it whenever the user needs predictable always-on hosting rather than serverless inference, custom containers, or raw clusters.
13together-video
Text-to-video and image-to-video generation via Together AI, including keyframe control, model and dimension selection, asynchronous job polling, and video downloads. Reach for it whenever the user wants motion generation on Together AI rather than still-image generation or text-only inference.
12