together-batch-inference
SKILL.md
Together Batch Inference
Overview
Process thousands of requests asynchronously at up to 50% cost discount. Ideal for workloads that don't need real-time responses:
- Evaluations and data analysis
- Large-scale classification
- Synthetic data generation
- Content generation and summarization
- Dataset transformations
Installation
# Python (recommended)
uv init # optional, if starting a new project
uv add together
# or with pip
pip install together
# TypeScript / JavaScript
npm install together-ai
Set your API key:
export TOGETHER_API_KEY=<your-api-key>
Workflow
- Prepare a
.jsonlbatch file with requests - Upload the file with
purpose="batch-api" - Create a batch job
- Poll for completion
- Download results
Quick Start
1. Prepare Batch File
Each line: custom_id (unique) + body (request payload).
{"custom_id": "req-1", "body": {"model": "deepseek-ai/DeepSeek-V3", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 200}}
{"custom_id": "req-2", "body": {"model": "deepseek-ai/DeepSeek-V3", "messages": [{"role": "user", "content": "Explain quantum computing"}], "max_tokens": 200}}
2. Upload and Create Batch
from together import Together
client = Together()
# Upload
file_resp = client.files.upload(file="batch_input.jsonl", purpose="batch-api", check=False)
# Create batch
batch = client.batches.create(input_file_id=file_resp.id, endpoint="/v1/chat/completions")
print(batch.job.id)
import Together from "together-ai";
const client = new Together();
// Upload (use the file ID returned by the Files API)
const fileId = "file-abc123";
const batch = await client.batches.create({
endpoint: "/v1/chat/completions",
input_file_id: fileId,
});
console.log(batch);
# Upload the batch file
curl -X POST "https://api.together.xyz/v1/files" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-F "purpose=batch-api" \
-F "file=@batch_input.jsonl"
# Create the batch (use the file id from upload response)
curl -X POST "https://api.together.xyz/v1/batches" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input_file_id": "file-abc123", "endpoint": "/v1/chat/completions"}'
3. Check Status
status = client.batches.retrieve(batch.job.id)
print(status.status) # VALIDATING → IN_PROGRESS → COMPLETED
import Together from "together-ai";
const client = new Together();
const batchId = batch.job?.id;
let batchInfo = await client.batches.retrieve(batchId);
console.log(batchInfo.status);
curl -X GET "https://api.together.xyz/v1/batches/batch-abc123" \
-H "Authorization: Bearer $TOGETHER_API_KEY"
4. Download Results
if status.status == "COMPLETED":
with client.files.with_streaming_response.content(id=status.output_file_id) as response:
with open("batch_output.jsonl", "wb") as f:
for chunk in response.iter_bytes():
f.write(chunk)
import Together from "together-ai";
const client = new Together();
const batchInfo = await client.batches.retrieve(batchId);
if (batchInfo.status === "COMPLETED" && batchInfo.output_file_id) {
const resp = await client.files.content(batchInfo.output_file_id);
const result = await resp.text();
console.log(result);
}
5. Cancel / List
client.batches.cancel(batch_id) # Cancel a batch
batches = client.batches.list() # List all batches
import Together from "together-ai";
const client = new Together();
// List all batches
const allBatches = await client.batches.list();
for (const batch of allBatches) {
console.log(batch);
}
# Cancel a batch
curl -X POST "https://api.together.xyz/v1/batches/batch-abc123/cancel" \
-H "Authorization: Bearer $TOGETHER_API_KEY"
# List all batches
curl -X GET "https://api.together.xyz/v1/batches" \
-H "Authorization: Bearer $TOGETHER_API_KEY"
Status Flow
| Status | Description |
|---|---|
VALIDATING |
Input file being validated |
IN_PROGRESS |
Batch processing |
COMPLETED |
Done — download results |
FAILED |
Processing failed |
CANCELLED |
Batch was cancelled |
EXPIRED |
Job expired before completion |
Output order may differ from input — use custom_id to match results.
Models with 50% Discount
| Model ID | Discount |
|---|---|
| deepseek-ai/DeepSeek-R1-0528-tput | 50% |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 50% |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | 50% |
| meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 50% |
| meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 50% |
| meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 50% |
| meta-llama/Meta-Llama-3-70B-Instruct-Turbo | 50% |
| meta-llama/Llama-3-70b-chat-hf | 50% |
| meta-llama/Llama-3.3-70B-Instruct-Turbo | 50% |
| Qwen/Qwen2.5-72B-Instruct-Turbo | 50% |
| Qwen/Qwen2.5-7B-Instruct-Turbo | 50% |
| Qwen/Qwen3-235B-A22B-fp8-tput | 50% |
| Qwen/Qwen3-235B-A22B-Thinking-2507 | 50% |
| Qwen/Qwen2.5-VL-72B-Instruct | 50% |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 50% |
| mistralai/Mistral-7B-Instruct-v0.1 | 50% |
| zai-org/GLM-4.5-Air-FP8 | 50% |
| openai/whisper-large-v3 | 50% |
All serverless models are available for batch — models not listed have no discount.
Rate Limits
- Max enqueued tokens: 30B per model
- Per-batch limit: 50,000 requests
- File size: 100MB max
- Separate pool: Doesn't consume standard rate limits
Error Handling
Check error_file_id for per-request failures:
{"custom_id": "req-1", "error": {"message": "Invalid model specified", "code": "invalid_model"}}
Best Practices
- Aim for 1,000-10,000 requests per batch
- Validate JSONL before submission
- Use unique
custom_idvalues - Poll status every 30-60 seconds
- Most batches complete within 24 hours (allow 72 hours for large/complex models)
- Batch files can be reused for multiple jobs
Resources
- Full API reference: See references/api-reference.md
- Runnable script: See scripts/batch_workflow.py — complete upload → create → poll → download pipeline (v2 SDK)
- Runnable script (TypeScript): See scripts/batch_workflow.ts — minimal OpenAPI
x-codeSamplesextraction for list/create/retrieve/cancel (TypeScript SDK) - Official docs: Batch Inference
- API reference: Batch API
Weekly Installs
8
Repository
zainhas/togethe…i-skillsFirst Seen
Feb 27, 2026
Security Audits
Installed on
opencode8
github-copilot8
codex8
kimi-cli8
gemini-cli8
cursor8