tilekernels-gpu-kernels
Installation
SKILL.md
TileKernels GPU Kernel Library
Skill by ara.so — Daily 2026 Skills collection.
TileKernels is a high-performance GPU kernel library for LLM operations (MoE routing, FP8/FP4 quantization, transpose, engram gating, Manifold HyperConnection) written in TileLang — a Python DSL for expressing GPU kernels with automatic optimization. Kernels target NVIDIA SM90/SM100 (Hopper/Blackwell) architectures and approach hardware performance limits.
Requirements
- Python 3.10+
- PyTorch 2.10+
- TileLang 0.1.9+
- NVIDIA SM90 or SM100 GPU (H100/H200/B100/B200)
- CUDA Toolkit 13.1+
Installation
# Development install (recommended for extending/modifying kernels)
pip install -e ".[dev]"