flashkda-delta-attention
Installation
SKILL.md
FlashKDA Delta Attention Skill
Skill by ara.so — Daily 2026 Skills collection.
FlashKDA provides high-performance CUDA kernels for Kimi Delta Attention (KDA) built on CUTLASS. It targets SM90+ GPUs (H100/H20 class) and integrates as a drop-in backend for flash-linear-attention's chunk_kda operation.
Requirements
- GPU: SM90+ (H100, H20, or newer)
- CUDA 12.9+
- PyTorch 2.4+
- Python 3.8+