together-dedicated-containers
Together Dedicated Containers
Overview
Use Dedicated Container Inference when the user needs a custom runtime, not just managed model hosting.
Core building blocks:
- Jig CLI for build and deployment
- Sprocket SDK for request handling inside the container
- Queue API for async jobs
When This Skill Wins
- Deploy a custom inference worker
- Bundle custom dependencies or runtime logic into a container
- Use queue-based async processing with progress tracking
- Run a specialized image, video, or multimodal pipeline
Hand Off To Another Skill
- Use
together-dedicated-endpointsfor standard model hosting without custom containers - Use
together-gpu-clustersfor full cluster ownership and orchestration control - Use
together-chat-completions,together-images, ortogether-videowhen a serverless product already covers the task
Quick Routing
- Minimal worker template
- Start with scripts/sprocket_hello_world.py
- Read references/sprocket-sdk.md
- Build, deploy, logs, queue, and secrets
- Queue submission and polling
- Start with scripts/queue_client.py or scripts/queue_client.ts
Workflow
- Confirm that the user truly needs a custom container runtime.
- Implement the worker with Sprocket's request lifecycle.
- Configure
pyproject.tomlfor image, runtime, autoscaling, and mounts. - Deploy with Jig.
- Submit jobs through the queue API and poll until completion.
High-Signal Rules
- Python scripts require the Together v2 SDK (
together>=2.0.0). If the user is on an older version, they must upgrade first:uv pip install --upgrade "together>=2.0.0". - Prefer dedicated endpoints over containers unless the runtime or pipeline is genuinely custom.
- Treat the worker contract and
pyproject.tomlas the source of truth for deployment behavior. - Parameterize deployment name, queue inputs, and resource sizing instead of hardcoding them.
- Queue-based jobs are asynchronous by default; account for polling and result retrieval in client code.
Resource Map
- Jig CLI: references/jig-cli.md
- Sprocket SDK: references/sprocket-sdk.md
- Python queue client: scripts/queue_client.py
- TypeScript queue client: scripts/queue_client.ts
- Worker template: scripts/sprocket_hello_world.py
Official Docs
More from zainhas/skills
together-audio
Use this skill for Together AI audio workflows: text-to-speech over REST, streaming, or realtime WebSocket APIs, plus speech-to-text transcription, translation, diarization, timestamps, and live transcription. Reach for it whenever the user needs audio in or audio out on Together AI rather than generic chat generation, image or video creation, or model training.
1together-images
Use this skill for Together AI image workflows: text-to-image generation, image editing with Kontext, FLUX model selection, LoRA-based styling, reference-image guidance, and local image downloads. Reach for it whenever the user wants to generate or edit images on Together AI rather than create videos or build text-only chat applications.
1together-video
Use this skill for Together AI video workflows: text-to-video generation, image-to-video with keyframe control, model and dimension selection, polling asynchronous jobs, and downloading completed videos. Reach for it whenever the user wants motion generation on Together AI rather than still-image generation or text-only inference.
1together-embeddings
Use this skill for Together AI embedding, retrieval, and reranking workflows: generating dense vectors, building semantic search or RAG pipelines, and using rerank models behind dedicated endpoints. Reach for it whenever the user needs vector representations or retrieval quality improvements rather than direct text generation.
1together-gpu-clusters
Use this skill for Together AI GPU clusters and raw infrastructure workflows: provisioning on-demand or reserved clusters, choosing Kubernetes vs Slurm, attaching shared storage, scaling, getting credentials, and operating cluster-backed ML or HPC jobs. Reach for it when the user needs multi-node compute or infrastructure control rather than a managed model endpoint.
1together-fine-tuning
Use this skill for Together AI fine-tuning workflows: LoRA or full fine-tuning, DPO preference tuning, VLM training, function-calling tuning, reasoning tuning, and BYOM uploads. Reach for it whenever the user wants to adapt a model on custom data rather than only run inference, evaluate outputs, or host an existing model.
1