ray-data
Installation
SKILL.md
Ray Data - Scalable ML Data Processing
Distributed data processing library for ML and AI workloads.
When to use Ray Data
Use Ray Data when:
- Processing large datasets (>100GB) for ML training
- Need distributed data preprocessing across cluster
- Building batch inference pipelines
- Loading multi-modal data (images, audio, video)
- Scaling data processing from laptop to cluster
Key features:
- Streaming execution: Process data larger than memory
- GPU support: Accelerate transforms with GPUs
- Framework integration: PyTorch, TensorFlow, HuggingFace
- Multi-modal: Images, Parquet, CSV, JSON, audio, video