video-processing
Video Processing
Overview
This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.
Core Approach: Verify Before Implementing
Before writing detection algorithms, establish ground truth understanding of the video content:
- Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
- Understand video metadata - Frame count, FPS, duration, resolution
- Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
- Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight
Workflow for Event Detection Tasks
Phase 1: Video Exploration
# Essential first steps for any video analysis task
import cv2
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
Critical: Extract frames at expected event locations to verify understanding:
def save_frame(video_path, frame_num, output_path):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
ret, frame = cap.read()
if ret:
cv2.imwrite(output_path, frame)
cap.release()
# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")
Phase 2: Algorithm Development
When developing detection algorithms:
- Start simple - Basic frame differencing or thresholding before complex approaches
- Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
- Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
- Log intermediate values - Track metrics at each frame to understand algorithm behavior
Phase 3: Validation
Before finalizing:
- Sanity check outputs - Do detected events occur in reasonable order and timing?
- Test on multiple videos - Verify generalization across different inputs
- Compare against expected ranges - If ground truth exists, verify detection accuracy
Common Detection Approaches
Frame Differencing
Compares frames against a reference (first frame or previous frame) to detect motion:
# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)
Pitfall: First frame may not be a suitable reference if scene changes or camera moves.
Contour-Based Detection
Identifies objects by finding contours in thresholded images:
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.
Tracking Position Over Time
For detecting events like jumps or gestures, track object position across frames:
positions = [] # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
# ... detection code ...
if detected:
positions.append((frame_num, cx, cy, area))
Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.
Verification Strategies
1. Visual Inspection
Save frames at detected event times to verify correctness:
# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
2. Timing Reasonableness
Check if detected events make temporal sense:
duration_seconds = frame_count / fps
event_time = detected_frame / fps
# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
print("WARNING: Event detected very late in video - verify correctness")
3. Sequence Validation
Ensure events occur in logical order:
if detected_landing <= detected_takeoff:
print("ERROR: Landing cannot occur before or at takeoff")
4. Multi-Video Testing
Test on multiple inputs early to catch overfitting to single video characteristics.
Common Pitfalls
1. No Ground Truth Verification
Problem: Relying entirely on computed metrics without visual confirmation.
Solution: Always save and inspect frames at detected event locations.
2. Confirmation Bias in Data Interpretation
Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.
Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.
3. Magic Number Thresholds
Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.
Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.
4. Ignoring Detection Gaps
Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.
Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.
5. Coordinate System Confusion
Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).
Solution: Explicitly document coordinate system assumptions and verify with visual inspection.
6. Ignoring Timing Reasonableness
Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).
Solution: Implement sanity checks on output timing.
7. Single Video Overfitting
Problem: Algorithm works on one video but fails on others.
Solution: Test on multiple videos early in development.
Output Format Considerations
When outputting results (e.g., to TOML, JSON):
import numpy as np
# Convert numpy types to Python native types for serialization
result = {
"takeoff_frame": int(takeoff_frame), # Not np.int64
"landing_frame": int(landing_frame),
}
Debugging Checklist
When detection results are incorrect:
- Have I visually inspected frames at the expected event times?
- Have I visually inspected frames at my detected event times?
- Do my detected times make temporal sense given video duration?
- Have I verified my algorithm on frames with known ground truth?
- Am I correctly interpreting the coordinate system?
- Have I tested on multiple videos?
- Are my thresholds derived from data or arbitrary?
- When detection fails on some frames, do I understand why?