yolo-detection-2026
YOLO 2026 Object Detection
Real-time object detection using the latest YOLO 2026 models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores.
Model Sizes
| Size | Speed | Accuracy | Best For |
|---|---|---|---|
| nano | Fastest | Good | Real-time on CPU, edge devices |
| small | Fast | Better | Balanced speed/accuracy |
| medium | Moderate | High | Accuracy-focused deployments |
| large | Slower | Highest | Maximum detection quality |
Hardware Acceleration
The skill uses env_config.py to automatically detect hardware and convert the model to the fastest format for your platform. Conversion happens once during deployment and is cached.
| Platform | Backend | Optimized Format | Compute Units | Expected Speedup |
|---|---|---|---|---|
| NVIDIA GPU | CUDA | TensorRT .engine |
GPU | ~3-5x |
| Apple Silicon (M1+) | MPS | CoreML .mlpackage |
Neural Engine (NPU) | ~2x |
| Intel CPU/GPU/NPU | OpenVINO | OpenVINO IR .xml |
CPU/GPU/NPU | ~2-3x |
| AMD GPU | ROCm | ONNX Runtime | GPU | ~1.5-2x |
| CPU (any) | CPU | ONNX Runtime | CPU | ~1.5x |
Apple Silicon Note: Detection defaults to
cpu_and_ne(CPU + Neural Engine), keeping the GPU free for LLM/VLM inference. Setcompute_units: allto include GPU if not running local LLM.
How It Works
deploy.shdetects your hardware viaenv_config.HardwareEnv.detect()- Installs the matching
requirements_{backend}.txt(e.g. CUDA → includestensorrt) - Pre-converts the default model to the optimal format
- At runtime,
detect.pyloads the cached optimized model automatically - Falls back to PyTorch if optimization fails
Set use_optimized: false to disable auto-conversion and use raw PyTorch.
Auto Start
Set auto_start: true in the skill config to start detection automatically when Aegis launches. The skill will begin processing frames from the selected camera immediately.
auto_start: true
model_size: nano
fps: 5
Performance Monitoring
The skill emits perf_stats events every 50 frames with aggregate timing:
{"event": "perf_stats", "total_frames": 50, "timings_ms": {
"inference": {"avg": 3.4, "p50": 3.2, "p95": 5.1},
"postprocess": {"avg": 0.15, "p50": 0.12, "p95": 0.31},
"total": {"avg": 3.6, "p50": 3.4, "p95": 5.5}
}}
Protocol
Communicates via JSON lines over stdin/stdout.
Aegis → Skill (stdin)
{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080}
Skill → Aegis (stdout)
{"event": "ready", "model": "yolo2026n", "device": "mps", "backend": "mps", "format": "coreml", "gpu": "Apple M3", "classes": 80, "fps": 5}
{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "objects": [
{"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}
]}
{"event": "perf_stats", "total_frames": 50, "timings_ms": {"inference": {"avg": 3.4}}}
{"event": "error", "message": "...", "retriable": true}
Bounding Box Format
[x_min, y_min, x_max, y_max] — pixel coordinates (xyxy).
Stop Command
{"command": "stop"}
Installation
The deploy.sh bootstrapper handles everything — Python environment, GPU backend detection, dependency installation, and model optimization. No manual setup required.
./deploy.sh
Requirements Files
| File | Backend | Key Deps |
|---|---|---|
requirements_cuda.txt |
NVIDIA | torch (cu124), tensorrt |
requirements_mps.txt |
Apple | torch, coremltools |
requirements_intel.txt |
Intel | torch, openvino |
requirements_rocm.txt |
AMD | torch (rocm6.2), onnxruntime-rocm |
requirements_cpu.txt |
CPU | torch (cpu), onnxruntime |