pytorch
SKILL.md
PyTorch
Deep learning framework for research and production.
When to Use
- Deep learning research
- Custom neural network architectures
- GPU-accelerated training
- Model prototyping
Quick Start
import torch
import torch.nn as nn
# Simple model
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
)
x = torch.randn(32, 784) # Batch of 32
output = model(x)
Core Concepts
Tensors & Autograd
import torch
# Create tensors
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = torch.randn(3, 4, device='cuda') # GPU tensor
# Operations
z = x @ torch.randn(3, 4) # Matrix multiply
z = torch.softmax(z, dim=-1)
# Autograd
loss = z.sum()
loss.backward()
print(x.grad) # Gradients
Custom Modules
class TransformerBlock(nn.Module):
def __init__(self, d_model: int, n_heads: int):
super().__init__()
self.attention = nn.MultiheadAttention(d_model, n_heads)
self.norm1 = nn.LayerNorm(d_model)
self.ff = nn.Sequential(
nn.Linear(d_model, d_model * 4),
nn.GELU(),
nn.Linear(d_model * 4, d_model)
)
self.norm2 = nn.LayerNorm(d_model)
def forward(self, x: torch.Tensor) -> torch.Tensor:
attn_out, _ = self.attention(x, x, x)
x = self.norm1(x + attn_out)
x = self.norm2(x + self.ff(x))
return x
Common Patterns
Training Loop
model = MyModel().to('cuda')
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()
for epoch in range(epochs):
model.train()
for batch in dataloader:
inputs, targets = batch
inputs, targets = inputs.to('cuda'), targets.to('cuda')
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
# Save model
torch.save(model.state_dict(), 'model.pt')
Best Practices
Do:
- Use
torch.no_grad()for inference - Move data to GPU efficiently
- Use mixed precision training
- Profile with torch.profiler
Don't:
- Forget to call
model.eval()for inference - Skip gradient zeroing
- Create tensors in loops
- Ignore CUDA memory management
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| CUDA OOM | Memory exhausted | Reduce batch size |
| NaN loss | Gradient explosion | Lower learning rate |
| Slow training | CPU bottleneck | Use DataLoader workers |
References
Weekly Installs
2
Repository
g1joshi/agent-skillsGitHub Stars
7
First Seen
Feb 10, 2026
Security Audits
Installed on
mcpjam2
claude-code2
replit2
junie2
windsurf2
zencoder2