mediapipe-pose-detection
MediaPipe Pose Detection
Key Landmarks for Jump Analysis
Lower Body (Primary for Jumps)
| Landmark | Left Index | Right Index | Use Case |
|---|---|---|---|
| Hip | 23 | 24 | Center of mass, jump height |
| Knee | 25 | 26 | Triple extension, landing |
| Ankle | 27 | 28 | Ground contact detection |
| Heel | 29 | 30 | Takeoff/landing timing |
| Toe | 31 | 32 | Forefoot contact |
Upper Body (Secondary)
| Landmark | Left Index | Right Index | Use Case |
|---|---|---|---|
| Shoulder | 11 | 12 | Arm swing tracking |
| Elbow | 13 | 14 | Arm action |
| Wrist | 15 | 16 | Arm swing timing |
Reference Points
| Landmark | Index | Use Case |
|---|---|---|
| Nose | 0 | Head position |
| Left Eye | 2 | Face orientation |
| Right Eye | 5 | Face orientation |
Confidence Thresholds
Default Settings
min_detection_confidence = 0.5 # Initial pose detection
min_tracking_confidence = 0.5 # Frame-to-frame tracking
Quality Presets (auto_tuning.py)
| Preset | Detection | Tracking | Use Case |
|---|---|---|---|
fast |
0.3 | 0.3 | Quick processing, tolerates errors |
balanced |
0.5 | 0.5 | Default, good accuracy |
accurate |
0.7 | 0.7 | Best accuracy, slower |
Tuning Guidelines
- Increase thresholds when: Jittery landmarks, false detections
- Decrease thresholds when: Missing landmarks, tracking loss
- Typical adjustment: ±0.1 increments
Common Issues and Solutions
Landmark Jitter
Symptoms: Landmarks jump erratically between frames
Solutions:
- Apply Butterworth low-pass filter (cutoff 6-10 Hz)
- Increase tracking confidence
- Use One-Euro filter for real-time applications
# Butterworth filter (filtering.py)
from kinemotion.core.filtering import butterworth_filter
smoothed = butterworth_filter(landmarks, cutoff=8.0, fps=30)
# One-Euro filter (smoothing.py)
from kinemotion.core.smoothing import one_euro_filter
smoothed = one_euro_filter(landmarks, min_cutoff=1.0, beta=0.007)
Left/Right Confusion
Symptoms: MediaPipe swaps left and right landmarks mid-video
Cause: Occlusion at 90° lateral camera angle
Solutions:
- Use 45° oblique camera angle (recommended)
- Post-process to detect and correct swaps
- Use single-leg tracking when possible
Tracking Loss
Symptoms: Landmarks disappear for several frames
Causes:
- Athlete moves out of frame
- Fast motion blur
- Occlusion by equipment/clothing
Solutions:
- Ensure full athlete visibility throughout video
- Use higher frame rate (60+ fps)
- Interpolate missing frames (up to 3-5 frames)
# Simple linear interpolation for gaps
import numpy as np
def interpolate_gaps(landmarks, max_gap=5):
# Fill NaN gaps with linear interpolation
for i in range(landmarks.shape[1]):
mask = np.isnan(landmarks[:, i])
if mask.sum() > 0 and mask.sum() <= max_gap:
landmarks[:, i] = np.interp(
np.arange(len(landmarks)),
np.where(~mask)[0],
landmarks[~mask, i]
)
return landmarks
Low Confidence Scores
Symptoms: Visibility scores consistently below threshold
Causes:
- Poor lighting (backlighting, shadows)
- Low contrast clothing vs background
- Partial occlusion
Solutions:
- Improve lighting (front-lit, even)
- Ensure clothing contrasts with background
- Remove obstructions from camera view
Video Processing (video_io.py)
Rotation Handling
Mobile videos often have rotation metadata that must be handled:
# video_io.py handles this automatically
# Reads EXIF rotation and applies correction
from kinemotion.core.video_io import read_video_frames
frames, fps, dimensions = read_video_frames("mobile_video.mp4")
# Frames are correctly oriented regardless of source
Manual Rotation (if needed)
# FFmpeg rotation options
ffmpeg -i input.mp4 -vf "transpose=1" output.mp4 # 90° clockwise
ffmpeg -i input.mp4 -vf "transpose=2" output.mp4 # 90° counter-clockwise
ffmpeg -i input.mp4 -vf "hflip" output.mp4 # Horizontal flip
Frame Dimensions
Always read actual frame dimensions from first frame, not metadata:
# Correct approach
cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()
height, width = frame.shape[:2]
# Incorrect (may be wrong for rotated videos)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
Coordinate Systems
MediaPipe Output
- Normalized coordinates: (0.0, 0.0) to (1.0, 1.0)
- Origin: Top-left corner
- X: Left to right
- Y: Top to bottom
- Z: Depth (relative, camera-facing is negative)
Conversion to Pixels
def normalized_to_pixel(landmark, width, height):
x = int(landmark.x * width)
y = int(landmark.y * height)
return x, y
Visibility Score
Each landmark has a visibility score (0.0-1.0):
-
0.5: Likely visible and accurate
- < 0.5: May be occluded or estimated
- = 0.0: Not detected
Debug Overlay (debug_overlay.py)
Skeleton Drawing
# Key connections for jump visualization
POSE_CONNECTIONS = [
(23, 25), (25, 27), (27, 29), (27, 31), # Left leg
(24, 26), (26, 28), (28, 30), (28, 32), # Right leg
(23, 24), # Hips
(11, 23), (12, 24), # Torso
]
Color Coding
| Element | Color (BGR) | Meaning |
|---|---|---|
| Skeleton | (0, 255, 0) | Green - normal tracking |
| Low confidence | (0, 165, 255) | Orange - visibility < 0.5 |
| Key angles | (255, 0, 0) | Blue - measured angles |
| Phase markers | (0, 0, 255) | Red - takeoff/landing |
Performance Optimization
Reducing Latency
- Use
model_complexity=0for fastest inference - Process every Nth frame for batch analysis
- Use GPU acceleration if available
import mediapipe as mp
pose = mp.solutions.pose.Pose(
model_complexity=0, # 0=Lite, 1=Full, 2=Heavy
min_detection_confidence=0.5,
min_tracking_confidence=0.5,
static_image_mode=False # False for video (uses tracking)
)
Memory Management
- Release pose estimator after processing:
pose.close() - Process videos in chunks for large files
- Use generators for frame iteration
Integration with kinemotion
File Locations
- Pose estimation:
src/kinemotion/core/pose.py - Video I/O:
src/kinemotion/core/video_io.py - Filtering:
src/kinemotion/core/filtering.py - Smoothing:
src/kinemotion/core/smoothing.py - Auto-tuning:
src/kinemotion/core/auto_tuning.py
Typical Pipeline
Video → read_video_frames() → pose.process() → filter/smooth → analyze
Manual Observation for Validation
During development, use manual frame-by-frame observation to establish ground truth and validate pose detection accuracy.
When to Use Manual Observation
- Algorithm development: Validating new phase detection methods
- Parameter tuning: Comparing detected vs actual frames
- Debugging: Investigating pose detection failures
- Ground truth collection: Building validation datasets
Ground Truth Data Collection Protocol
Step 1: Generate Debug Video
uv run kinemotion cmj-analyze video.mp4 --output debug.mp4
Step 2: Manual Frame-by-Frame Analysis
Open debug video in a frame-stepping tool (QuickTime, VLC with frame advance, or video editor).
Step 3: Record Observations
For each key phase, record the frame number where the event occurs:
=== MANUAL OBSERVATION: PHASE DETECTION ===
Video: ________________________
FPS: _____ Total Frames: _____
PHASE DETECTION (frame numbers)
| Phase | Detected | Manual | Error | Notes |
|-------|----------|--------|-------|-------|
| Standing End | ___ | ___ | ___ | |
| Lowest Point | ___ | ___ | ___ | |
| Takeoff | ___ | ___ | ___ | |
| Peak Height | ___ | ___ | ___ | |
| Landing | ___ | ___ | ___ | |
LANDMARK QUALITY (per phase)
| Phase | Hip Visible | Knee Visible | Ankle Visible | Notes |
|-------|-------------|--------------|---------------|-------|
| Standing | Y/N | Y/N | Y/N | |
| Countermovement | Y/N | Y/N | Y/N | |
| Flight | Y/N | Y/N | Y/N | |
| Landing | Y/N | Y/N | Y/N | |
Phase Detection Criteria
Standing End: Last frame before downward hip movement begins
- Look for: Hip starts descending, knees begin flexing
Lowest Point: Frame where hip reaches minimum height
- Look for: Deepest squat position, hip at lowest Y coordinate
Takeoff: First frame where both feet leave ground
- Look for: Toe/heel landmarks separate from ground plane
- Note: May be 1-2 frames after visible liftoff due to detection lag
Peak Height: Frame where hip reaches maximum height
- Look for: Hip at highest Y coordinate during flight
Landing: First frame where foot contacts ground
- Look for: Heel or toe landmark touches ground plane
- Note: Algorithm may detect 1-2 frames late (velocity-based)
Landmark Quality Assessment
For each landmark, observe:
| Quality | Criteria |
|---|---|
| Good | Landmark stable, positioned correctly on body part |
| Jittery | Landmark oscillates ±5-10 pixels between frames |
| Offset | Landmark consistently displaced from actual position |
| Lost | Landmark missing or wildly incorrect |
| Swapped | Left/right landmarks switched |
Recording Observations Format
When validating, provide structured data:
## Ground Truth: [video_name]
**Video Info:**
- Frames: 215
- FPS: 60
- Duration: 3.58s
- Camera: 45° oblique
**Phase Detection Comparison:**
| Phase | Detected | Manual | Error (frames) | Error (ms) |
|-------|----------|--------|----------------|------------|
| Standing End | 64 | 64 | 0 | 0 |
| Lowest Point | 91 | 88 | +3 (late) | +50 |
| Takeoff | 104 | 104 | 0 | 0 |
| Landing | 144 | 142 | +2 (late) | +33 |
**Error Analysis:**
- Mean absolute error: 1.25 frames (21ms)
- Bias detected: Landing consistently late
- Accuracy: 2/4 perfect, 4/4 within ±3 frames
**Landmark Issues Observed:**
- Frame 87-92: Hip jitter during lowest point
- Frame 140-145: Ankle tracking unstable at landing
Acceptable Error Thresholds
At 60fps (16.67ms per frame):
| Error Level | Frames | Time | Interpretation |
|---|---|---|---|
| Perfect | 0 | 0ms | Exact match |
| Excellent | ±1 | ±17ms | Within human observation variance |
| Good | ±2 | ±33ms | Acceptable for most metrics |
| Acceptable | ±3 | ±50ms | May affect precise timing metrics |
| Investigate | >3 | >50ms | Algorithm may need adjustment |
Bias Detection
Look for systematic patterns across multiple videos:
| Pattern | Meaning | Action |
|---|---|---|
| Consistent +N frames | Algorithm detects late | Adjust threshold earlier |
| Consistent -N frames | Algorithm detects early | Adjust threshold later |
| Variable ±N frames | Normal variance | No action needed |
| Increasing error | Tracking degrades | Check landmark quality |
Integration with basic-memory
Store ground truth observations:
# Save validation results
write_note(
title="CMJ Phase Detection Validation - [video_name]",
content="[structured observation data]",
folder="biomechanics"
)
# Search previous validations
search_notes("phase detection ground truth")
# Build context for analysis
build_context("memory://biomechanics/*")
Example: CMJ Validation Study Reference
See basic-memory for complete validation study:
biomechanics/cmj-phase-detection-validation-45deg-oblique-view-ground-truthbiomechanics/cmj-landing-detection-bias-root-cause-analysisbiomechanics/cmj-landing-detection-impact-vs-contact-method-comparison
Key findings from validation:
- Standing End: 100% accuracy (0 frame error)
- Takeoff: ~0.7 frame mean error (excellent)
- Lowest Point: ~2.3 frame mean error (variable)
- Landing: +1-2 frame consistent bias (investigate)