NYC
skills/erichowens/some_claude_skills/computer-vision-pipeline

computer-vision-pipeline

SKILL.md

Computer Vision Pipeline

Expert in building production-ready computer vision systems for object detection, tracking, and video analysis.

When to Use

Use for:

  • Drone footage analysis (archaeological surveys, conservation)
  • Wildlife monitoring and tracking
  • Real-time object detection systems
  • Video preprocessing and analysis
  • Custom model training and inference
  • Multi-object tracking (MOT)

NOT for:

  • Simple image filters (use Pillow/PIL)
  • Photo editing (use Photoshop/GIMP)
  • Face recognition APIs (use AWS Rekognition)
  • Basic OCR (use Tesseract)

Technology Selection

Object Detection Models

Model Speed (FPS) Accuracy (mAP) Use Case
YOLOv8 140 53.9% Real-time detection
Detectron2 25 58.7% High accuracy, research
EfficientDet 35 55.1% Mobile deployment
Faster R-CNN 10 42.0% Legacy systems

Timeline:

  • 2015: Faster R-CNN (two-stage detection)
  • 2016: YOLO v1 (one-stage, real-time)
  • 2020: YOLOv5 (PyTorch, production-ready)
  • 2023: YOLOv8 (state-of-the-art)
  • 2024: YOLOv8 is industry standard for real-time

Decision tree:

Need real-time (>30 FPS)? → YOLOv8
Need highest accuracy? → Detectron2 Mask R-CNN
Need mobile deployment? → YOLOv8-nano or EfficientDet
Need instance segmentation? → Detectron2 or YOLOv8-seg
Need custom objects? → Fine-tune YOLOv8

Common Anti-Patterns

Anti-Pattern 1: Not Preprocessing Frames Before Detection

Novice thinking: "Just run detection on raw video frames"

Problem: Poor detection accuracy, wasted GPU cycles.

Wrong approach:

# ❌ No preprocessing - poor results
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Raw frame detection - no normalization, no resizing
    results = model(frame)
    # Poor accuracy, slow inference

Why wrong:

  • Video resolution too high (4K = 8.3 megapixels per frame)
  • No normalization (pixel values 0-255 instead of 0-1)
  • Aspect ratio not maintained
  • GPU memory overflow on high-res frames

Correct approach:

# ✅ Proper preprocessing pipeline
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

# Model expects 640x640 input
TARGET_SIZE = 640

def preprocess_frame(frame):
    # Resize while maintaining aspect ratio
    h, w = frame.shape[:2]
    scale = TARGET_SIZE / max(h, w)
    new_w, new_h = int(w * scale), int(h * scale)

    resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

    # Pad to square
    pad_w = (TARGET_SIZE - new_w) // 2
    pad_h = (TARGET_SIZE - new_h) // 2

    padded = cv2.copyMakeBorder(
        resized,
        pad_h, TARGET_SIZE - new_h - pad_h,
        pad_w, TARGET_SIZE - new_w - pad_w,
        cv2.BORDER_CONSTANT,
        value=(114, 114, 114)  # Gray padding
    )

    # Normalize to 0-1 (if model expects it)
    # normalized = padded.astype(np.float32) / 255.0

    return padded, scale

while True:
    ret, frame = video.read()
    if not ret:
        break

    preprocessed, scale = preprocess_frame(frame)
    results = model(preprocessed)

    # Scale bounding boxes back to original coordinates
    for box in results[0].boxes:
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale

Performance comparison:

  • Raw 4K frames: 5 FPS, 72% mAP
  • Preprocessed 640x640: 45 FPS, 89% mAP

Timeline context:

  • 2015: Manual preprocessing required
  • 2020: YOLOv5 added auto-resize
  • 2023: YOLOv8 has smart preprocessing but explicit control is better

Anti-Pattern 2: Processing Every Frame in Video

Novice thinking: "Run detection on every single frame"

Problem: 99% of frames are redundant, wasting compute.

Wrong approach:

# ❌ Process every frame (30 FPS video = 1800 frames/min)
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection on EVERY frame
    results = model(frame)
    detections.append(results)

# 10-minute video = 18,000 inferences (15 minutes on GPU)

Why wrong:

  • Adjacent frames are nearly identical
  • Wasting 95% of compute on duplicate work
  • Slow processing time
  • Massive storage for results

Correct approach 1: Frame sampling

# ✅ Sample every Nth frame
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

SAMPLE_RATE = 30  # Process 1 frame per second (if 30 FPS video)

frame_count = 0
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    frame_count += 1

    # Only process every 30th frame
    if frame_count % SAMPLE_RATE == 0:
        results = model(frame)
        detections.append({
            'frame': frame_count,
            'timestamp': frame_count / 30.0,
            'results': results
        })

# 10-minute video = 600 inferences (30 seconds on GPU)

Correct approach 2: Adaptive sampling with scene change detection

# ✅ Only process when scene changes significantly
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

def scene_changed(prev_frame, curr_frame, threshold=0.3):
    """Detect scene change using histogram comparison"""
    if prev_frame is None:
        return True

    # Convert to grayscale
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

    # Calculate histograms
    prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
    curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])

    # Compare histograms
    correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)

    return correlation < (1 - threshold)

prev_frame = None
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Only run detection if scene changed
    if scene_changed(prev_frame, frame):
        results = model(frame)
        detections.append(results)

    prev_frame = frame.copy()

# Adapts to video content - static shots skip frames, action scenes process more

Savings:

  • Every frame: 18,000 inferences
  • Sample 1 FPS: 600 inferences (97% reduction)
  • Adaptive: ~1,200 inferences (93% reduction)

Anti-Pattern 3: Not Using Batch Inference

Novice thinking: "Process one image at a time"

Problem: GPU sits idle 80% of the time waiting for data.

Wrong approach:

# ❌ Sequential processing - GPU underutilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

# 100 images to process
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

start = time.time()

for path in image_paths:
    frame = cv2.imread(path)
    results = model(frame)  # Process one at a time
    # GPU utilization: ~20%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 45 seconds

Why wrong:

  • GPU has to wait for CPU to load each image
  • No parallelization
  • GPU utilization ~20%
  • Slow throughput

Correct approach:

# ✅ Batch inference - GPU fully utilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

BATCH_SIZE = 16  # Process 16 images at once

start = time.time()

for i in range(0, len(image_paths), BATCH_SIZE):
    batch_paths = image_paths[i:i+BATCH_SIZE]

    # Load batch
    frames = [cv2.imread(path) for path in batch_paths]

    # Batch inference (single GPU call)
    results = model(frames)  # Pass list of images
    # GPU utilization: ~85%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 8 seconds (5.6x faster!)

Performance comparison:

Method Time (100 images) GPU Util Throughput
Sequential 45s 20% 2.2 img/s
Batch (16) 8s 85% 12.5 img/s
Batch (32) 6s 92% 16.7 img/s

Batch size tuning:

# Find optimal batch size for your GPU
import torch

def find_optimal_batch_size(model, image_size=(640, 640)):
    for batch_size in [1, 2, 4, 8, 16, 32, 64]:
        try:
            dummy_input = torch.randn(batch_size, 3, *image_size).cuda()

            start = time.time()
            with torch.no_grad():
                _ = model(dummy_input)
            elapsed = time.time() - start

            throughput = batch_size / elapsed
            print(f"Batch {batch_size}: {throughput:.1f} img/s")
        except RuntimeError as e:
            print(f"Batch {batch_size}: OOM (out of memory)")
            break

# Find optimal batch size before production
find_optimal_batch_size(model)

Anti-Pattern 4: Ignoring Non-Maximum Suppression (NMS) Tuning

Problem: Duplicate detections, missed objects, slow post-processing.

Wrong approach:

# ❌ Use default NMS settings for everything
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Default settings (iou_threshold=0.45, conf_threshold=0.25)
results = model('crowded_scene.jpg')

# Result: 50 bounding boxes, 30 are duplicates!

Why wrong:

  • Default IoU=0.45 is too permissive for dense objects
  • Default conf=0.25 includes low-quality detections
  • No adaptation to use case

Correct approach:

# ✅ Tune NMS for your use case
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Sparse objects (dolphins in ocean)
sparse_results = model(
    'ocean_footage.jpg',
    iou=0.5,    # Higher IoU = allow closer boxes
    conf=0.4    # Higher confidence = fewer false positives
)

# Dense objects (crowd, flock of birds)
dense_results = model(
    'crowded_scene.jpg',
    iou=0.3,    # Lower IoU = suppress more duplicates
    conf=0.5    # Higher confidence = filter noise
)

# High precision needed (legal evidence)
precise_results = model(
    'evidence.jpg',
    iou=0.5,
    conf=0.7,   # Very high confidence
    max_det=50  # Limit max detections
)

NMS parameter guide:

Use Case IoU Conf Max Det
Sparse objects (wildlife) 0.5 0.4 100
Dense objects (crowd) 0.3 0.5 300
High precision (evidence) 0.5 0.7 50
Real-time (speed priority) 0.45 0.3 100

Anti-Pattern 5: No Tracking Between Frames

Novice thinking: "Run detection on each frame independently"

Problem: Can't count unique objects, track movement, or build trajectories.

Wrong approach:

# ❌ Independent frame detection - no object identity
from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    results = model(frame)
    detections.append(results)

# Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin
# Can't count unique dolphins
# Can't track trajectories

Why wrong:

  • No object identity across frames
  • Can't count unique objects
  • Can't analyze movement patterns
  • Can't build trajectories

Correct approach: Use tracking (ByteTrack)

# ✅ Multi-object tracking with ByteTrack
from ultralytics import YOLO
import cv2

# YOLO with tracking
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

# Track objects across frames
tracks = {}

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection + tracking
    results = model.track(
        frame,
        persist=True,     # Maintain IDs across frames
        tracker='bytetrack.yaml'  # ByteTrack algorithm
    )

    # Each detection now has persistent ID
    for box in results[0].boxes:
        track_id = int(box.id[0])  # Unique ID across frames
        x1, y1, x2, y2 = box.xyxy[0]

        # Store trajectory
        if track_id not in tracks:
            tracks[track_id] = []

        tracks[track_id].append({
            'frame': len(tracks[track_id]),
            'bbox': (x1, y1, x2, y2),
            'conf': box.conf[0]
        })

# Now we can analyze:
print(f"Unique dolphins detected: {len(tracks)}")

# Trajectory analysis
for track_id, trajectory in tracks.items():
    if len(trajectory) > 30:  # Only long tracks
        print(f"Dolphin {track_id} appeared in {len(trajectory)} frames")
        # Calculate movement, speed, etc.

Tracking benefits:

  • Count unique objects (not just detections per frame)
  • Build trajectories and movement patterns
  • Analyze behavior over time
  • Filter out brief false positives

Tracking algorithms:

Algorithm Speed Robustness Occlusion Handling
ByteTrack Fast Good Excellent
SORT Very Fast Fair Fair
DeepSORT Medium Excellent Good
BotSORT Medium Excellent Excellent

Production Checklist

□ Preprocess frames (resize, pad, normalize)
□ Sample frames intelligently (1 FPS or scene change detection)
□ Use batch inference (16-32 images per batch)
□ Tune NMS thresholds for your use case
□ Implement tracking if analyzing video
□ Log inference time and GPU utilization
□ Handle edge cases (empty frames, corrupted video)
□ Save results in structured format (JSON, CSV)
□ Visualize detections for debugging
□ Benchmark on representative data

When to Use vs Avoid

Scenario Appropriate?
Analyze drone footage for archaeology ✅ Yes - custom object detection
Track wildlife in video ✅ Yes - detection + tracking
Count people in crowd ✅ Yes - dense object detection
Real-time security camera ✅ Yes - YOLOv8 real-time
Filter vacation photos ❌ No - use photo management apps
Face recognition login ❌ No - use AWS Rekognition API
Read license plates ❌ No - use specialized OCR

References

  • /references/yolo-guide.md - YOLOv8 setup, training, inference patterns
  • /references/video-processing.md - Frame extraction, scene detection, optimization
  • /references/tracking-algorithms.md - ByteTrack, SORT, DeepSORT comparison

Scripts

  • scripts/video_analyzer.py - Extract frames, run detection, generate timeline
  • scripts/model_trainer.py - Fine-tune YOLO on custom dataset, export weights

This skill guides: Computer vision | Object detection | Video analysis | YOLO | Tracking | Drone footage | Wildlife monitoring

Weekly Installs
33
First Seen
Jan 24, 2026
Installed on
gemini-cli24
claude-code24
antigravity23
codex23
opencode23
cursor22