pytorch-core
Overview
Core PyTorch provides the fundamental building blocks for deep learning, focusing on tensor computation with strong GPU acceleration and a deep-learning-oriented autograd system. It emphasizes a "define-by-run" approach where models are standard Python objects.
When to Use
Use PyTorch Core when you need granular control over model architecture, custom training loops, or specific hardware optimizations like pinned memory for data transfers.
Decision Tree
- Do you know the input dimensions of your data?
- YES: Use standard layers (e.g.,
nn.Linear). - NO: Use Lazy modules (e.g.,
nn.LazyLinear) to defer initialization.
- YES: Use standard layers (e.g.,
- Is your bottleneck data transfer to the GPU?
- YES: Enable
pin_memory=Truein yourDataLoader. - NO: Standard data loading suffices.
- YES: Enable
- Are you fine-tuning a model?
- YES: Set
requires_grad=Falsefor frozen parameters. - NO: Keep
requires_grad=Truefor full training.
- YES: Set
Workflows
-
Standard Training Iteration
- Load a batch of data from the
DataLoader. - Zero the gradients using
optimizer.zero_grad(). - Perform a forward pass through the
nn.Module. - Compute the loss using a criterion (e.g.,
nn.CrossEntropyLoss). - Execute a backward pass with
loss.backward()to compute gradients. - Update model parameters using
optimizer.step().
- Load a batch of data from the
-
Model Persistence and Checkpointing
- Capture the state of the model and optimizer using
.state_dict(). - Save the dictionaries to a file using
torch.save(). - Restore the model by instantiating the class and calling
.load_state_dict(). - Ensure
.eval()is called before inference to handle Dropout and BatchNorm correctly.
- Capture the state of the model and optimizer using
-
Deferred Architecture Initialization
- Define the model using Lazy modules (e.g.,
nn.LazyLinear). - Initialize the model on the desired device.
- Run a dummy input or the first real batch through the model.
- PyTorch automatically infers and sets the weight shapes based on the input.
- Define the model using Lazy modules (e.g.,
Non-Obvious Insights
- Lazy Initialization: Using
LazyLinearorLazyConv2dsimplifies architecture definitions where input dimensions are unknown, preventing manual shape calculation errors. - Data Transfer Optimization: Using
pin_memory()in DataLoaders is a critical optimization for faster data transfer between CPU and GPU. - Dynamic Gradient Control: The
requires_gradattribute can be toggled on-the-fly to freeze parameters during fine-tuning or transfer learning without re-instantiating the model.
Evidence
- "Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models." (https://pytorch.org/tutorials/beginner/basics/intro.html)
- "Lazy modules like LazyLinear allow for deferred initialization of input dimensions until the first forward pass." (https://pytorch.org/docs/stable/nn.html)
Scripts
scripts/pytorch-core_tool.py: Provides a standard training loop skeleton and lazy initialization examples.scripts/pytorch-core_tool.js: Node.js wrapper for invoking PyTorch training scripts.
Dependencies
- torch
- torchvision (optional for datasets)
- numpy
References
More from cuba6112/skillfactory
ollama-rag
Build RAG systems with Ollama local + cloud models. Latest cloud models include DeepSeek-V3.2 (GPT-5 level), Qwen3-Coder-480B (1M context), MiniMax-M2. Use for document Q&A, knowledge bases, and agentic RAG. Covers LangChain, LlamaIndex, ChromaDB, and embedding models.
17unsloth-sft
Supervised fine-tuning using SFTTrainer, instruction formatting, and multi-turn dataset preparation with triggers like sft, instruction tuning, chat templates, sharegpt, alpaca, conversation_extension, and SFTTrainer.
6torchaudio
Audio signal processing library for PyTorch. Covers feature extraction (spectrograms, mel-scale), waveform manipulation, and GPU-accelerated data augmentation techniques. (torchaudio, melscale, spectrogram, pitchshift, specaugment, waveform, resample)
5pytorch-onnx
Exporting PyTorch models to ONNX format for cross-platform deployment. Includes handling dynamic axes, graph optimization in ONNX Runtime, and INT8 model quantization. (onnx, onnxruntime, torch.onnx.export, dynamic_axes, constant-folding, edge-deployment)
5unsloth-lora
Configuring and optimizing 16-bit Low-Rank Adaptation (LoRA) and Rank-Stabilized LoRA (rsLoRA) for efficient LLM fine-tuning using triggers like lora, qlora, rslora, rank selection, lora_alpha, lora_dropout, and target_modules.
4pytorch-quantization
Techniques for model size reduction and inference acceleration using INT8 quantization, including Post-Training Quantization (PTQ) and Quantization Aware Training (QAT). (quantization, int8, qat, fbgemm, qnnpack, ptq, dequantize)
3