tinker-api
Tinker API Training Skill
Expert guidance for using the Tinker API - a training platform for fine-tuning large language models with supervised learning, reinforcement learning, and preference optimization.
How It Works
- Detect Tinker API usage - When user mentions Tinker training, RL environments, sampling, or references Tinker API operations
- Identify the workflow - Determine if it's supervised learning, RL training, sampling, rendering, or infrastructure setup
- Provide comprehensive guidance - Use scripts and examples to implement complete training recipes
- Emphasize best practices - Async patterns, overlapping requests, proper data preparation, checkpoint management
Quick Reference: Common Workflows
| Workflow | Script | Key APIs |
|---|---|---|
| Setup & Test Connection | scripts/setup-check.py |
ServiceClient, get_server_capabilities |
| Supervised Fine-tuning | scripts/supervised-training.py |
forward_backward, cross_entropy, rendering |
| RL Training Loop | scripts/rl-training.py |
sample, policy gradients, advantage estimation |
| Sampling & Inference | scripts/sampling-demo.py |
SamplingClient, sample, compute_logprobs |
| Vision Model Training | scripts/vision-training.py |
ImageChunk, Qwen3VLRenderer |
| Save/Load Checkpoints | scripts/checkpoint-management.py |
save_state, load_state, save_weights_for_sampler |
Core Concepts
1. Client Types
ServiceClient - Entry point for all operations
import tinker
service_client = tinker.ServiceClient()
TrainingClient - For training operations (forward/backward, optim_step)
training_client = service_client.create_lora_training_client(
base_model="Qwen/Qwen3-30B-A3B",
rank=32 # LoRA rank
)
SamplingClient - For inference and generation
sampling_client = service_client.create_sampling_client(
base_model="Qwen/Qwen3-30B-A3B"
)
# Or from saved weights:
sampling_client = training_client.save_weights_and_get_sampling_client(name="checkpoint-001")
2. Training Data Structure
All training uses Datum objects:
from tinker import types
datum = types.Datum(
model_input=types.ModelInput.from_ints(input_tokens),
loss_fn_inputs={
"target_tokens": target_tokens, # For cross_entropy
"weights": weights, # Loss weights per token
"advantages": advantages, # For RL
"logprobs": sampling_logprobs # For importance sampling
}
)
3. Loss Functions
Supervised Learning:
cross_entropy- Standard NLL loss for supervised fine-tuning
Reinforcement Learning:
importance_sampling- Corrects for sampling/learner policy mismatchppo- Proximal Policy Optimization with clippingcispo- Clipped Importance Sampling Policy Optimizationdro- Direct Reward Optimization (offline RL)
Custom Losses:
forward_backward_custom- Define arbitrary differentiable loss functions
4. Async Patterns (Critical for Performance!)
Always overlap requests to avoid missing clock cycles (~10 seconds each):
# GOOD - Submit both operations before waiting
fwd_bwd_future = await client.forward_backward_async(batch, "cross_entropy")
optim_future = await client.optim_step_async(adam_params)
# Now wait for results
fwd_bwd_result = await fwd_bwd_future
optim_result = await optim_future
# BAD - Sequential waiting misses clock cycles
fwd_bwd_result = await (await client.forward_backward_async(batch, "cross_entropy"))
optim_result = await (await client.optim_step_async(adam_params)) # May miss cycle!
5. Rendering (Messages to Tokens)
Use renderers to convert conversations to tokens:
from tinker_cookbook import renderers, tokenizer_utils
tokenizer = tokenizer_utils.get_tokenizer('Qwen/Qwen3-30B-A3B')
renderer = renderers.get_renderer('qwen3', tokenizer)
# For generation (inference/RL)
prompt = renderer.build_generation_prompt(messages)
stop_sequences = renderer.get_stop_sequences()
# For supervised learning
model_input, weights = renderer.build_supervised_example(messages)
Available renderers:
qwen3- Qwen3 models with thinking enabled (default)qwen3_disable_thinking- Qwen3 without thinking tokensllama3- Llama 3 modelsdeepseekv3- DeepSeek V3 models
6. Vision Models
For vision-language models (Qwen3-VL):
from tinker_cookbook.renderers import Message, ImagePart, TextPart
messages = [
Message(role='user', content=[
ImagePart(type='image', image='https://example.com/image.png'),
TextPart(type='text', text='What is in this image?')
])
]
# Use Qwen3VL renderer
from tinker_cookbook.image_processing_utils import get_image_processor
image_processor = get_image_processor("Qwen/Qwen3-VL-235B-A22B-Instruct")
renderer = renderers.Qwen3VLInstructRenderer(tokenizer, image_processor)
Training Recipes
Recipe 1: Supervised Fine-tuning
Use Case: Train model on instruction-following data with known correct responses
Script: scripts/supervised-training.py
Key Steps:
- Prepare conversation data with renderer
- Forward-backward with
cross_entropyloss - Optimizer step with Adam
- Save checkpoints periodically
- Sample to evaluate progress
Critical Details:
- Use
renderer.build_supervised_example()to get proper loss weights - Only assistant turns get weight=1, context gets weight=0
- Monitor loss per token:
-np.dot(logprobs, weights) / weights.sum()
Recipe 2: Reinforcement Learning
Use Case: Train with rewards/preferences, handle multi-turn interactions
Script: scripts/rl-training.py
Key Steps:
- Sample multiple completions per query
- Compute rewards (external evaluator, rule-based, or human feedback)
- Estimate advantages (per-group centering recommended)
- Forward-backward with policy gradient loss (
ppo,importance_sampling) - Optional: Incorporate KL penalty into rewards
Critical Details:
- Save
sampling_logprobsduring generation for importance sampling - Use group-based advantage estimation (GRPO-style)
- For PPO: clip ratios prevent large policy updates
- Monitor KL divergence to reference policy
Recipe 3: Vision Model Training
Use Case: Fine-tune vision-language models on multimodal data
Script: scripts/vision-training.py
Key Steps:
- Use Qwen3-VL models (30B or 235B)
- Prepare messages with
ImagePartandTextPart - Use
Qwen3VLRendererorQwen3VLInstructRenderer - Train with supervised or RL approaches
- Handle special vision tokens automatically
Critical Details:
- Images must specify format (png, jpg, etc.)
- Vision models require
image_processor - Special tokens
<|vision_start|>and<|vision_end|>handled by renderer
Recipe 4: Checkpoint Management
Use Case: Save progress, resume training, create sampling clients
Script: scripts/checkpoint-management.py
Key Operations:
# Save for sampling only (faster, less storage)
sampling_path = training_client.save_weights_for_sampler(name="step-100").result().path
# Save full state (weights + optimizer) for resuming
resume_path = training_client.save_state(name="checkpoint-100").result().path
# Resume training
training_client.load_state(resume_path)
# Create sampling client from checkpoint
sampling_client = service_client.create_sampling_client(model_path=sampling_path)
Common Patterns & Best Practices
Pattern 1: Training Loop Structure
import asyncio
import tinker
from tinker import types
async def training_loop():
service_client = tinker.ServiceClient()
training_client = await service_client.create_lora_training_client_async(
base_model="Qwen/Qwen3-30B-A3B",
rank=32
)
for step in range(num_steps):
# Prepare batch
batch = prepare_training_batch() # Your data preparation
# Overlap forward_backward and optim_step
fwd_bwd_future = await training_client.forward_backward_async(
batch, "cross_entropy"
)
optim_future = await training_client.optim_step_async(
types.AdamParams(learning_rate=1e-4)
)
# Wait for results
fwd_bwd_result = await fwd_bwd_future
optim_result = await optim_future
# Log metrics
logprobs = [output['logprobs'] for output in fwd_bwd_result.loss_fn_outputs]
# ... compute and log loss
# Periodic checkpointing
if step % checkpoint_interval == 0:
await training_client.save_state_async(name=f"step-{step}")
asyncio.run(training_loop())
Pattern 2: RL with Group-Based Advantage Estimation
# Sample multiple completions per query
queries = ["Query 1", "Query 2", ...]
samples_per_query = 8
all_sequences = []
for query in queries:
prompt = renderer.build_generation_prompt([{"role": "user", "content": query}])
result = await sampling_client.sample_async(
prompt=prompt,
num_samples=samples_per_query,
sampling_params=types.SamplingParams(
max_tokens=100,
temperature=0.8,
stop=renderer.get_stop_sequences()
)
)
all_sequences.extend(result.sequences)
# Compute rewards
rewards = compute_rewards(all_sequences) # Your reward function
# Per-group advantage centering (GRPO-style)
advantages = []
for i in range(len(queries)):
group_rewards = rewards[i*samples_per_query:(i+1)*samples_per_query]
group_mean = np.mean(group_rewards)
group_std = np.std(group_rewards) + 1e-8
group_advantages = [(r - group_mean) / group_std for r in group_rewards]
advantages.extend(group_advantages)
# Prepare training data
training_data = [
types.Datum(
model_input=types.ModelInput.from_ints(seq.tokens[:-1]),
loss_fn_inputs={
"target_tokens": seq.tokens[1:],
"logprobs": seq.logprobs, # From sampling
"advantages": advantages[i]
}
)
for i, seq in enumerate(all_sequences)
]
# Train with PPO
fwd_bwd_future = await training_client.forward_backward_async(
training_data,
loss_fn="ppo",
loss_fn_config={"clip_low_threshold": 0.9, "clip_high_threshold": 1.1}
)
Pattern 3: Multi-Turn Conversations
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What's the population?"}
]
# Generate next assistant response
prompt = renderer.build_generation_prompt(messages)
result = await sampling_client.sample_async(
prompt=prompt,
num_samples=1,
sampling_params=types.SamplingParams(
max_tokens=100,
stop=renderer.get_stop_sequences()
)
)
# Parse response back to message
sampled_message, success = renderer.parse_response(result.sequences[0].tokens)
if success:
messages.append(sampled_message)
Common Pitfalls & Solutions
Pitfall 1: Missing Clock Cycles
Problem: Sequential async calls waste time Solution: Always overlap independent operations:
# Submit both before waiting
future1 = await client.op1_async()
future2 = await client.op2_async()
result1 = await future1
result2 = await future2
Pitfall 2: Incorrect Target Tokens
Problem: Forgetting to shift tokens for autoregressive prediction
Solution: Input tokens = tokens[:-1], target tokens = tokens[1:]
Pitfall 3: Loss Weights Misconfiguration
Problem: Training on prompt tokens or missing completion tokens
Solution: Use renderer's build_supervised_example() which sets weights correctly
Pitfall 4: Not Saving Sampling Logprobs
Problem: Can't use importance sampling correction in RL
Solution: Always include logprobs in returned sequences during sampling
Pitfall 5: Renderer Compatibility Issues
Problem: Training with non-HF-compatible renderer breaks OpenAI endpoint
Solution: Use default renderers (qwen3, llama3, etc.) for deployment compatibility
Environment Variables
Set your API key:
export TINKER_API_KEY=<your-key>
Supported Models
Text Models:
meta-llama/Llama-3.1-70Bmeta-llama/Llama-3.1-8BQwen/Qwen3-30B-A3BQwen/Qwen3-8Bdeepseek-ai/DeepSeek-V3
Vision-Language Models:
Qwen/Qwen3-VL-30B-A3B-InstructQwen/Qwen3-VL-235B-A22B-Instruct
Usage Instructions for AI Agents
When a user requests help with Tinker API:
-
Identify the task type:
- Setup/connection testing → Use
scripts/setup-check.py - Supervised fine-tuning → Use
scripts/supervised-training.py - RL training → Use
scripts/rl-training.py - Sampling/inference → Use
scripts/sampling-demo.py - Vision tasks → Use
scripts/vision-training.py - Checkpoint operations → Use
scripts/checkpoint-management.py
- Setup/connection testing → Use
-
Provide the appropriate script and explain how to customize it for their use case
-
Emphasize critical patterns:
- Always use async and overlap operations
- Use renderers for message-to-token conversion
- Save sampling logprobs for RL
- Monitor metrics during training
-
Reference documentation:
- Full docs: https://github.com/thinking-machines-lab/tinker-cookbook
- Cookbook examples in
tinker-cookbook/repo
-
Help debug issues:
- Check async patterns
- Verify tensor shapes and types
- Confirm renderer compatibility
- Review loss function configuration
Additional Resources
- Tinker Cookbook: https://github.com/thinking-machines-lab/tinker-cookbook
- Tinker Docs: https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/refs/heads/main/docs/
- Blog: https://thinkingmachines.ai/blog/
- OpenAI-Compatible API:
/chat/completionsendpoint available