tinygrad
tinygrad
A minimal deep learning framework focused on beauty and minimalism. Every line must earn its keep.
Quick Reference
from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters
# Tensor creation
x = Tensor([1, 2, 3])
x = Tensor.rand(2, 3)
x = Tensor.kaiming_uniform(128, 784)
# Operations are lazy until realized
y = (x + 1).relu().sum()
y.realize() # or y.numpy()
# Training context
with Tensor.train():
loss = model(x).sparse_categorical_crossentropy(labels).backward()
optim.step()
Architecture Pipeline
- Tensor (
tinygrad/tensor.py) - User API, creates UOp graph - UOp (
tinygrad/uop/ops.py) - Unified IR for all operations - Schedule (
tinygrad/engine/schedule.py) - Converts tensor UOps to kernel UOps - Codegen (
tinygrad/codegen/) - Converts kernel UOps to device code - Runtime (
tinygrad/runtime/) - Device-specific execution
Training Loop Pattern
from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist
X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))
@TinyJit
@Tensor.train()
def train_step():
optim.zero_grad()
samples = Tensor.randint(512, high=X_train.shape[0])
loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
return loss.realize(*optim.schedule_step())
for i in range(100):
loss = train_step()
Model Definition
Models are plain Python classes with __call__. No base class required.
class Model:
def __init__(self):
self.l1 = nn.Linear(784, 128)
self.l2 = nn.Linear(128, 10)
def __call__(self, x):
return self.l1(x).relu().sequential([self.l2])
Available nn modules: Linear, Conv2d, BatchNorm, LayerNorm, RMSNorm, Embedding, GroupNorm, LSTMCell
Optimizers: SGD, Adam, AdamW, LARS, LAMB, Muon
State Dict / Weights
from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters
# Save/load safetensors
safe_save(get_state_dict(model), "model.safetensors")
load_state_dict(model, safe_load("model.safetensors"))
# Get all trainable params
params = get_parameters(model)
JIT Compilation
TinyJit captures and replays kernel graphs. Input shapes must be fixed.
@TinyJit
def forward(x):
return model(x).realize()
# First call captures, subsequent calls replay
out = forward(batch)
Device Management
from tinygrad import Device
print(Device.DEFAULT) # Auto-detected: METAL, CUDA, AMD, CPU, etc.
# Force device
x = Tensor.rand(10, device="CPU")
x = x.to("CUDA")
Environment Variables
| Variable | Values | Description |
|---|---|---|
DEBUG |
1-7 | Increasing verbosity (4=code, 7=asm) |
VIZ |
1 | Graph visualization |
BEAM |
# | Kernel beam search width |
NOOPT |
1 | Disable optimizations |
SPEC |
1-2 | UOp spec verification |
Debugging
# Visualize computation graph
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"
# Show generated code
DEBUG=4 python script.py
# Run tests
python -m pytest test/test_tensor.py -xvs
UOp and PatternMatcher (Internals)
UOps are immutable, cached graph nodes. Use PatternMatcher for transformations:
from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite
pm = PatternMatcher([
(UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)
Key UOp properties: op, dtype, src, arg, tag
Define PatternMatchers at module level - they're slow to construct.
Style Guide
- 2-space indentation, 150 char line limit
- Prefer readability over cleverness
- Never mix functionality changes with whitespace changes
- All functionality changes must be tested
- Run
pre-commit run --all-filesbefore commits
Testing
python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py # With spec verification
More from av/skills
run-llms
Comprehensive guide for setting up and running local LLMs using Harbor. Use when user wants to run LLMs locally, set up or troubleshoot Ollama, Open WebUI, llama.cpp, vLLM, SearXNG, Open Terminal, or similar local AI services. Covers full setup from Docker prerequisites through running models, per-service configuration, VRAM optimization, GPU troubleshooting, web search integration, code execution, profiles, tunnels, and advanced features. Includes decision trees for autonomous agent workflows and step-by-step troubleshooting playbooks.
16preact-buildless-frontend
Build-less ESM frontends that run directly in the browser without bundlers. Use this skill when creating static frontends, SPAs without build tools, prototypes, or when the user explicitly wants no Vite/Webpack/bundler. Covers import maps, CDN imports, cache-busting, hash routing, and performance patterns.
12turso-db
Install, configure, and work with Turso DB — an in-process SQLite-compatible relational database engine written in Rust. Use when the user needs to (1) install Turso DB, (2) create or query databases with the tursodb CLI shell, (3) use Turso from JavaScript/Node.js via @tursodatabase/database, (4) work with vector search or embeddings in Turso, (5) set up full-text search with FTS indexes, (6) configure transactions including MVCC concurrent transactions, (7) enable encryption at rest, or (8) use Change Data Capture (CDC) for audit logging.
8boost-modules
Create custom modules for [Harbor Boost](https://github.com/av/harbor/tree/main/boost), an optimizing LLM proxy. Use when building Python modules that intercept/transform LLM chat completions—reasoning chains, prompt injection, structured outputs, artifacts, or custom workflows. Triggers on requests to create Boost modules, extend LLM behavior via proxy, or implement chat completion middleware.
8bugbash
Systematically explore and test any software project (CLI, API, Backend, Library, etc.) to find bugs, usability issues, and edge cases. Produces a structured report with full reproduction evidence (exact commands, inputs, logs, and tracebacks) for every issue.
5agent-integration-testing
Use when the user requests integration testing, feature validation, or test plan execution
4