caffe-cifar-10
Caffe CIFAR-10 Build and Training
This skill provides procedural guidance for building the Caffe deep learning framework from source and training models on the CIFAR-10 dataset.
When to Use This Skill
- Building Caffe from source on Ubuntu/Debian systems
- Training CIFAR-10 or similar image classification models with Caffe
- Configuring Caffe for CPU-only execution
- Troubleshooting Caffe build and dependency issues
Critical Requirements Checklist
Before starting, identify ALL requirements from the task specification:
- Execution mode: CPU-only vs GPU (affects solver configuration)
- Iteration count: Specific number of training iterations required
- Output files: Where training logs and models should be saved
- Model checkpoints: Which iteration's model file is expected
Phase 1: Dependency Installation
System Dependencies
Install required packages before attempting to build:
apt-get update && apt-get install -y \
build-essential cmake git \
libprotobuf-dev libleveldb-dev libsnappy-dev \
libhdf5-serial-dev protobuf-compiler \
libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
libopencv-dev libboost-all-dev \
python3-dev python3-numpy python3-pip
Verification Step
Confirm critical libraries are installed:
dpkg -l | grep -E "libhdf5|libopencv|libboost"
Phase 2: Caffe Source Acquisition
Clone and Checkout
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0 # Note: Tag is "1.0", not "1.0.0"
Common Mistake
The release tag is 1.0, not 1.0.0. Verify with git tag -l if uncertain.
Phase 3: Makefile.config Configuration
Create Configuration File
cp Makefile.config.example Makefile.config
Essential Configuration Changes
Apply these modifications to Makefile.config:
-
CPU-Only Mode (if no GPU available):
CPU_ONLY := 1 -
OpenCV Version (for OpenCV 3.x or 4.x):
OPENCV_VERSION := 3Note: OpenCV 4 may require additional compatibility patches.
-
HDF5 Paths (Ubuntu-specific):
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial -
Python Configuration (Python 3):
PYTHON_LIBRARIES := boost_python3 python3.8 PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/includeAdjust version numbers based on installed Python version.
Configuration Verification
After editing, verify no duplicate definitions exist:
grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config
Ensure each setting appears only once in an uncommented form.
Phase 4: Building Caffe
Memory-Aware Compilation
Avoid using all CPU cores on memory-constrained systems:
# For systems with limited RAM (< 8GB)
make all -j2
# For systems with adequate RAM
make all -j$(nproc)
Build Failure Recovery
If the build fails or is killed (often due to memory):
-
Clean the build:
make clean -
Rebuild with reduced parallelism:
make all -j1
Build Verification
Confirm the binary exists after build:
ls -la .build_release/tools/caffe.bin
# or for CPU-only builds:
ls -la .build_release/tools/caffe
Phase 5: Dataset Preparation
Download CIFAR-10
./data/cifar10/get_cifar10.sh
Convert to LMDB Format
./examples/cifar10/create_cifar10.sh
Verification
Confirm LMDB directories exist:
ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb
Phase 6: Solver Configuration
Modify Solver for Requirements
Edit examples/cifar10/cifar10_quick_solver.prototxt:
-
Set iteration count:
max_iter: 500 # Or as specified in task -
Set execution mode:
solver_mode: CPU # Change from GPU if required
Verification
grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxt
Phase 7: Training Execution
Run Training with Output Capture
./build/tools/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt \
2>&1 | tee training_output.txt
Alternative Binary Paths
Depending on build configuration, the binary may be at:
.build_release/tools/caffebuild/tools/caffe.build_release/tools/caffe.bin
Phase 8: Verification
Required Outputs Checklist
-
Caffe binary exists:
test -f .build_release/tools/caffe && echo "OK" || echo "MISSING" -
Model file exists (iteration-specific):
ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel -
Training output captured:
test -f training_output.txt && echo "OK" || echo "MISSING" -
Solver configured correctly:
grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt
Common Pitfalls
1. Premature Termination
Never stop after make clean or intermediate steps. Complete the full workflow:
Dependencies -> Build -> Dataset -> Configure -> Train -> Verify
2. Missing Solver Configuration
The solver file must be modified for:
- CPU vs GPU execution mode
- Specific iteration count requirements
3. Skipping Dataset Preparation
Training will fail without LMDB data. Always run both:
get_cifar10.sh(download)create_cifar10.sh(convert)
4. Build Parallelism Issues
High parallelism (-j$(nproc)) can exhaust memory. Start with -j2 on constrained systems.
5. Duplicate Configuration Entries
Multiple edits to Makefile.config can create duplicate definitions. Always verify single definitions for each setting.
6. Wrong Git Tag
Use 1.0 not 1.0.0 for the stable release.
Decision Framework
When encountering issues:
- Build killed: Reduce parallelism, run
make clean, rebuild with-j1 - Missing headers: Check HDF5 and OpenCV include paths in Makefile.config
- Python errors: Verify Python version matches configuration
- Training fails immediately: Check dataset preparation completed
- Wrong output location: Verify solver paths and output file redirection