cpp-reinforcement-learning
C++ Reinforcement Learning
Overview
This skill covers implementing reinforcement learning algorithms in C++ using LibTorch (PyTorch C++ frontend) and modern C++17/20 features. It provides patterns for building high-performance RL systems suitable for production deployment, robotics, game AI, and real-time applications.
When to Use
- Implementing DQN, PPO, SAC, or other RL algorithms in C++
- Building performance-critical RL training pipelines
- Creating efficient replay buffers with proper memory management
- Deploying trained models with ONNX Runtime
- Parallelizing environment rollouts across threads
- Integrating RL with existing C++ codebases (games, robotics, simulations)
Core Libraries
Primary: LibTorch (PyTorch C++ Frontend)
LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++.
Installation: Download from https://pytorch.org/get-started/locally (select C++/LibTorch)
CMake Integration:
cmake_minimum_required(VERSION 3.18)
project(rl_project)
set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)
add_executable(train_agent src/main.cpp)
target_link_libraries(train_agent "${TORCH_LIBRARIES}")
Secondary Libraries
- ONNX Runtime - Cross-platform inference deployment
- cpprl (mhubii/cpprl) - Reference PPO implementation
- Gymnasium C++ bindings - Environment interfaces
Quick Start: DQN Agent
#include <torch/torch.h>
struct DQNNet : torch::nn::Module {
torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};
DQNNet(int64_t state_dim, int64_t action_dim) {
fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128));
fc2 = register_module("fc2", torch::nn::Linear(128, 128));
fc3 = register_module("fc3", torch::nn::Linear(128, action_dim));
}
torch::Tensor forward(torch::Tensor x) {
x = torch::relu(fc1->forward(x));
x = torch::relu(fc2->forward(x));
return fc3->forward(x);
}
};
// Training loop
auto policy_net = std::make_shared<DQNNet>(state_dim, action_dim);
auto target_net = std::make_shared<DQNNet>(state_dim, action_dim);
torch::optim::Adam optimizer(policy_net->parameters(), lr);
// Compute loss
auto q_values = policy_net->forward(states).gather(1, actions);
auto next_q = target_net->forward(next_states).max(1).values.detach();
auto target = rewards + gamma * next_q * (1 - dones);
auto loss = torch::mse_loss(q_values.squeeze(), target);
// Backward pass
optimizer.zero_grad();
loss.backward();
optimizer.step();
Essential Patterns
Replay Buffer (Ring Buffer)
class ReplayBuffer {
public:
explicit ReplayBuffer(size_t capacity)
: capacity_(capacity), position_(0), size_(0) {
buffer_.reserve(capacity);
}
void push(Experience exp) {
if (buffer_.size() < capacity_) {
buffer_.push_back(std::move(exp));
} else {
buffer_[position_] = std::move(exp);
}
position_ = (position_ + 1) % capacity_;
size_ = std::min(size_ + 1, capacity_);
}
std::vector<Experience> sample(size_t batch_size);
private:
std::vector<Experience> buffer_;
size_t capacity_, position_, size_;
std::mt19937 rng_{std::random_device{}()};
};
GPU Device Management
torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
model->to(device);
// Create tensors on device
auto tensor = torch::zeros({batch_size, state_dim},
torch::TensorOptions().device(device).dtype(torch::kFloat32));
Inference Mode
{
torch::NoGradGuard no_grad;
auto action_values = model->forward(state);
auto action = action_values.argmax(1);
}
Common Pitfalls
- Forgetting train/eval mode - Call
model->train()ormodel->eval() - Missing NoGradGuard - Use for inference to save memory
- Tensor accumulation - Use
.detach()for stored tensors - Thread safety - Clone models for parallel threads
- Device mismatch - Verify all tensors on same device
Reference Files
- references/libtorch.md - LibTorch setup and API guide
- references/algorithms.md - DQN, PPO, SAC implementations
- references/memory-management.md - Replay buffers, smart pointers, RAII
- references/performance.md - Optimization, parallelization, GPU
- references/testing.md - Testing and debugging strategies
More from aznatkoiny/zai-skills
consulting-frameworks
>
98real-estate-investment
>
12x402-payments
|
11reinforcement-learning
|
10deep-learning
Comprehensive guide for Deep Learning with Keras 3 (Multi-Backend: JAX, TensorFlow, PyTorch). Use when building neural networks, CNNs for computer vision, RNNs/Transformers for NLP, time series forecasting, or generative models (VAEs, GANs). Covers model building (Sequential/Functional/Subclassing APIs), custom training loops, data augmentation, transfer learning, and production best practices.
8prompt-optimizer
Optimize prompts for Claude 4.x models using Anthropic's official best practices. Use when users want to improve, refine, or create effective prompts for Claude. Triggers include requests to optimize prompts, make prompts more effective, fix underperforming prompts, create system prompts, improve instruction following, reduce verbosity, control output formatting, or enhance agentic/tool-use behaviors. Also use when users report Claude is being too verbose, not following instructions, not using tools properly, or producing generic outputs.
8