pytorch
PyTorch - Deep Learning & Tensors
PyTorch is a Python-based scientific computing package that uses the power of Graphics Processing Units (GPUs) and provides maximum flexibility and speed through its dynamic computational graph system.
When to Use
- Building and training Deep Neural Networks (CNN, RNN, Transformers).
- Researching new AI architectures with dynamic graph needs.
- Accelerating tensor math on NVIDIA (CUDA) or Mac (MPS) hardware.
- Solving Physics-Informed Neural Networks (PINNs).
- Implementing Generative models (GANs, Diffusion).
- Large-scale optimization using Autograd (automatic differentiation).
- Production-grade AI deployment (via TorchScript/ONNX).
Reference Documentation
Official docs: https://pytorch.org/docs/
Tutorials: https://pytorch.org/tutorials/
Search patterns: torch.nn, torch.optim, torch.utils.data, Autograd, Tensor.to(device)
Core Principles
The Tensor
The central data structure, similar to NumPy's ndarray, but with two key additions: it can live on a GPU and it supports automatic differentiation.
Dynamic Computational Graph (Autograd)
PyTorch builds the graph "on the fly" as code executes. This allows for standard Python control flow (if/for) inside your models.
Modules and Parameters
nn.Module is the base class for all neural network components. It automatically tracks nn.Parameter objects (weights/biases) for optimization.
Quick Reference
Installation
# CPU
pip install torch torchvision
# GPU (Check pytorch.org for specific CUDA versions)
pip install torch --index-url https://download.pytorch.org/whl/cu121
Standard Imports
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
Basic Pattern - Simple Linear Regression (The "PyTorch Way")
import torch
# 1. Data (Tensors)
X = torch.tensor([[1.0], [2.0], [3.0]], requires_grad=True)
y = torch.tensor([[2.0], [4.0], [6.0]])
# 2. Simple Model
model = torch.nn.Linear(1, 1) # y = w*x + b
# 3. Loss and Optimizer
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# 4. Training Loop
for epoch in range(100):
prediction = model(X)
loss = criterion(prediction, y)
optimizer.zero_grad() # Clear previous gradients
loss.backward() # Compute gradients (Autograd)
optimizer.step() # Update weights
Critical Rules
✅ DO
- Always zero_grad() - PyTorch accumulates gradients by default. Forget this, and your model will fail to converge.
- Use .to(device) - Explicitly move your model AND your data to the same device (CPU or CUDA).
- Use DataLoader - Never feed data manually in a loop; DataLoader handles batching, shuffling, and multi-process loading.
- Set model.train() / model.eval() - This is vital for layers like Dropout and BatchNorm that behave differently during inference.
- Use torch.no_grad() for inference - This saves significant memory and compute by not building a graph.
- Specify dtypes - Be conscious of float32 (standard) vs float64 (scientific precision) vs float16 (speed/GPU).
❌ DON'T
- Mix CPU and GPU Tensors -
RuntimeError: Expected all tensors to be on the same deviceis the most common error. - Use standard Python loops for math - Use vectorized tensor operations for performance.
- Forget .item() - When getting a scalar value from a tensor for logging, use
loss.item()to detach it from the graph. - Overuse float64 on GPU - Many consumer GPUs have poor double-precision performance; use float32 if possible.
Anti-Patterns (NEVER)
import torch
# ❌ BAD: Mixing Python lists/arrays with Tensors in a loop
# for x in data:
# res = model(torch.tensor(x)) # Extremely slow re-allocation!
# ✅ GOOD: Batching
# data_tensor = torch.stack([torch.tensor(x) for x in data])
# res = model(data_tensor)
# ❌ BAD: Calculating loss without zeroing gradients
loss.backward()
optimizer.step()
# Next iteration... gradients will be double what they should be!
# ✅ GOOD:
optimizer.zero_grad()
loss.backward()
optimizer.step()
# ❌ BAD: Standard NumPy for inference
# with torch.no_grad():
# pred = model(X).numpy() # Can be slow on GPU if not handled
# ✅ GOOD: Explicit move to CPU
# pred = model(X).detach().cpu().numpy()
Tensors and Device Management
Moving between CPU and GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Create tensor on device
x = torch.randn(3, 3, device=device)
# Move model to device
model = MyModel().to(device)
# Move data to device during loop
for inputs, labels in dataloader:
inputs, labels = inputs.to(device), labels.to(device)
# ...
Building Models (nn.Module)
Flexible Architectures
class ScientificNet(nn.Module):
def __init__(self, input_dim, hidden_dim):
super().__init__()
self.layer1 = nn.Linear(input_dim, hidden_dim)
self.layer2 = nn.Linear(hidden_dim, 1)
self.dropout = nn.Dropout(0.2)
def forward(self, x):
x = F.relu(self.layer1(x))
x = self.dropout(x)
x = torch.sigmoid(self.layer2(x))
return x
model = ScientificNet(10, 50)
Custom Datasets (torch.utils.data)
Handling Scientific Files (e.g., HDF5 or CSV)
class MyScientificDataset(Dataset):
def __init__(self, file_path):
self.data = pd.read_csv(file_path)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
# Convert row to tensor
sample = torch.tensor(self.data.iloc[idx, :-1].values, dtype=torch.float32)
label = torch.tensor(self.data.iloc[idx, -1], dtype=torch.float32)
return sample, label
dataset = MyScientificDataset("experiment_results.csv")
loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)
Advanced Autograd
Gradients for Physics (Jacobians/Hessians)
x = torch.linspace(-5, 5, 100, requires_grad=True)
y = x**3
# First derivative dy/dx
# create_graph=True allows for higher-order derivatives
dy_dx = torch.autograd.grad(y.sum(), x, create_graph=True)[0]
# Second derivative (Hessian) d2y/dx2
d2y_dx2 = torch.autograd.grad(dy_dx.sum(), x)[0]
Practical Workflows
1. Physics-Informed Neural Network (PINN) Fragment
def pde_loss(model, x):
"""Simple ODE: u'(x) = u(x)."""
x.requires_grad = True
u = model(x)
u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]
return F.mse_loss(u_x, u)
# Training loop combines data_loss + pde_loss
2. Early Stopping for Scientific Training
best_loss = float('inf')
patience = 10
counter = 0
for epoch in range(1000):
train_loss = train_one_epoch()
val_loss = validate()
if val_loss < best_loss:
best_loss = val_loss
torch.save(model.state_dict(), 'best_model.pth')
counter = 0
else:
counter += 1
if counter >= patience:
print("Early stopping triggered")
break
3. Feature Extraction for Chemistry
def extract_embeddings(model, loader):
model.eval()
embeddings = []
with torch.no_grad():
for batch in loader:
# Assume model has a .get_features() method
features = model.get_features(batch.to(device))
embeddings.append(features.cpu())
return torch.cat(embeddings)
Performance Optimization
Using torch.compile (PyTorch 2.0+)
Significant speedups for modern models with one line:
model = MyModel()
compiled_model = torch.compile(model)
Mixed Precision (torch.cuda.amp)
Saves memory and speeds up training on modern GPUs (Tensor Cores).
scaler = torch.cuda.amp.GradScaler()
for inputs, labels in loader:
with torch.cuda.amp.autocast():
output = model(inputs)
loss = criterion(output, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Common Pitfalls and Solutions
Vanishing/Exploding Gradients
# ❌ Problem: Loss becomes 'nan' or weights don't update
# ✅ Solution: Use Gradient Clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
Dead ReLU
If a neuron's output is always <= 0, it stops learning.
# ✅ Solution: Use LeakyReLU or ELU
self.act = nn.LeakyReLU(0.01)
Memory Leak (Tensors staying in Graph)
Logging loss directly keeps the whole graph in memory.
# ❌ BAD: total_loss += loss
# ✅ GOOD: total_loss += loss.item()
PyTorch is the engine of the AI revolution. For scientists, it provides the bridge from classical data analysis to the world of differentiable models, allowing for the discovery of patterns that were previously invisible.
More from tondevrel/scientific-agent-skills
xgboost-lightgbm
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
197opencv
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
144ortools
Google Optimization Tools. An open-source software suite for optimization, specialized in vehicle routing, flows, integer and linear programming, and constraint programming. Features the world-class CP-SAT solver. Use for vehicle routing problems (VRP), scheduling, bin packing, knapsack problems, linear programming (LP), integer programming (MIP), network flows, constraint programming, combinatorial optimization, resource allocation, shift scheduling, job-shop scheduling, and discrete optimization problems.
75matplotlib
The foundational library for creating static, animated, and interactive visualizations in Python. Highly customizable and the industry standard for publication-quality figures. Use for 2D plotting, scientific data visualization, heatmaps, contours, vector fields, multi-panel figures, LaTeX-formatted plots, custom visualization tools, and plotting from NumPy arrays or Pandas DataFrames.
73plotly
A high-level interactive graphing library for Python. Ideal for web-based visualizations, 3D plots, and complex interactive dashboards. Built on plotly.js, it allows users to zoom, pan, and hover over data points in a browser-based environment. Use for interactive charts, web applications, Jupyter notebooks, 3D data visualization, geographic maps, financial charts, animations, time-series analysis, and building production-ready dashboards with Dash.
54scipy
Comprehensive guide for SciPy - the fundamental library for scientific and technical computing in Python. Use for integration, optimization, interpolation, linear algebra, signal processing, statistics, ODEs, Fourier transforms, and advanced scientific algorithms. Built on NumPy and essential for research and engineering.
51