together-audio
Together Audio
Overview
Use Together AI audio APIs for:
- text-to-speech generation
- streaming or realtime voice output
- speech-to-text transcription
- translation, diarization, and timestamps
- live captioning and realtime transcription
When This Skill Wins
- Generate spoken audio from text
- Transcribe uploaded audio files or URLs
- Add realtime voice or captioning to an app
- Extract speaker segments or word timings
Hand Off To Another Skill
- Use
together-chat-completionsfor text-only generation - Use
together-videoortogether-imagesfor visual generation workflows - Use
together-dedicated-endpointsonly when the audio model itself must be hosted on dedicated infrastructure
Quick Routing
- REST TTS or streaming TTS
- Read references/tts-models.md
- Start with scripts/tts_generate.py or scripts/tts_generate.ts
- Realtime TTS over WebSocket
- Read references/tts-models.md
- Start with scripts/tts_websocket.py
- File transcription, translation, diarization, or timestamps
- Read references/stt-models.md
- Start with scripts/stt_transcribe.py or scripts/stt_transcribe.ts
- Realtime STT
- Read references/stt-models.md
- Start with scripts/stt_realtime.py
Workflow
- Confirm whether the task is TTS or STT.
- Choose REST, streaming, or realtime transport based on latency and interaction needs.
- Pick the model and response format from the relevant reference file.
- Start from the matching script instead of rebuilding the request contract from memory.
- For Python STT uploads, open audio files in binary mode and pass the file handle rather than a bare path string.
High-Signal Rules
- Python scripts require the Together v2 SDK (
together>=2.0.0). If the user is on an older version, they must upgrade first:uv pip install --upgrade "together>=2.0.0". - Use
client.audio.speech.create()for TTS. - REST TTS returns a
BinaryAPIResponse; callresponse.write_to_file(path)to save it. Do NOT usestream_to_file(it does not exist on this object). - Streaming TTS (
stream=True) returns aStreamofAudioSpeechStreamChunkobjects. Iterate chunks, checkchunk.type, and decodebase64.b64decode(chunk.delta)for audio data. There is no file-writing helper on the stream object. - Use
client.audio.transcriptions.create()for transcription andclient.audio.translations.create()for translation. - Realtime APIs require audio-format discipline; confirm PCM expectations before streaming bytes.
- Diarization and word timestamps change response shape; code for the richer verbose output explicitly.
Resource Map
- TTS reference: references/tts-models.md
- STT reference: references/stt-models.md
- Python TTS workflow: scripts/tts_generate.py
- TypeScript TTS workflow: scripts/tts_generate.ts
- Python realtime TTS workflow: scripts/tts_websocket.py
- Python STT workflow: scripts/stt_transcribe.py
- TypeScript STT workflow: scripts/stt_transcribe.ts
- Python realtime STT workflow: scripts/stt_realtime.py
Official Docs
More from zainhas/skills
together-images
Use this skill for Together AI image workflows: text-to-image generation, image editing with Kontext, FLUX model selection, LoRA-based styling, reference-image guidance, and local image downloads. Reach for it whenever the user wants to generate or edit images on Together AI rather than create videos or build text-only chat applications.
1together-video
Use this skill for Together AI video workflows: text-to-video generation, image-to-video with keyframe control, model and dimension selection, polling asynchronous jobs, and downloading completed videos. Reach for it whenever the user wants motion generation on Together AI rather than still-image generation or text-only inference.
1together-embeddings
Use this skill for Together AI embedding, retrieval, and reranking workflows: generating dense vectors, building semantic search or RAG pipelines, and using rerank models behind dedicated endpoints. Reach for it whenever the user needs vector representations or retrieval quality improvements rather than direct text generation.
1together-gpu-clusters
Use this skill for Together AI GPU clusters and raw infrastructure workflows: provisioning on-demand or reserved clusters, choosing Kubernetes vs Slurm, attaching shared storage, scaling, getting credentials, and operating cluster-backed ML or HPC jobs. Reach for it when the user needs multi-node compute or infrastructure control rather than a managed model endpoint.
1together-fine-tuning
Use this skill for Together AI fine-tuning workflows: LoRA or full fine-tuning, DPO preference tuning, VLM training, function-calling tuning, reasoning tuning, and BYOM uploads. Reach for it whenever the user wants to adapt a model on custom data rather than only run inference, evaluate outputs, or host an existing model.
1together-batch-inference
Use this skill for Together AI Batch API workflows: preparing JSONL inputs, uploading batch files, creating asynchronous jobs, polling status, downloading outputs, and optimizing large offline inference runs for lower cost. Reach for it whenever the user needs high-volume, non-interactive inference rather than real-time chat or evaluation jobs.
1