mediapipe-usage
Google MediaPipe Usage (Web / Pose Landmarker)
Quick Start
- Install
@mediapipe/tasks-vision, resolve WASM from CDN - Create
PoseLandmarkerwithcreateFromOptions - Use
detect()for single image, ordetectForVideo()in a throttledrequestAnimationFrameloop
Setup
Install (prefer pnpm):
pnpm add @mediapipe/tasks-vision
WASM root: Resolve vision tasks from CDN when creating the task:
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm",
);
Create the Pose Landmarker Task
Use PoseLandmarker.createFromOptions(vision, options):
import { PoseLandmarker, FilesetResolver } from "@mediapipe/tasks-vision";
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm",
);
const poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: modelUrl, // see reference.md for lite/full/heavy URLs
delegate: "GPU", // falls back to CPU if unavailable
},
runningMode: "VIDEO", // or "IMAGE" for single image
numPoses: 1,
minPoseDetectionConfidence: 0.5,
minPosePresenceConfidence: 0.5,
minTrackingConfidence: 0.5,
});
- runningMode:
IMAGEfor single image → usedetect(image).VIDEOfor stream → usedetectForVideo(video, timestamp). - baseOptions.modelAssetPath: URL to a
.taskmodel (lite / full / heavy). See reference.md for URLs. - delegate:
"GPU"preferred; some environments fall back to CPU.
Run the Task
Single image (runningMode IMAGE):
const result = poseLandmarker.detect(imageElement);
Video / webcam (runningMode VIDEO):
Call detectForVideo(video, timestamp) inside a requestAnimationFrame loop. Throttle by time (e.g. ~33 ms between frames) to avoid excessive work:
let lastFrameTime = 0;
function detectLoop() {
const now = performance.now();
if (video.readyState >= 2 && now - lastFrameTime > 33) {
lastFrameTime = now;
const result = poseLandmarker.detectForVideo(video, now);
if (result.landmarks?.length) {
const landmarks = result.landmarks[0]; // first person
// use landmarks
}
}
requestAnimationFrame(detectLoop);
}
requestAnimationFrame(detectLoop);
Result Shape
- result.landmarks: Array of poses; each pose is
NormalizedLandmark[](33 points). Each landmark:x,y,z(normalized 0–1; z is depth relative to hip center),visibility(0–1). - result.worldLandmarks: Optional 3D coordinates in meters (same indices).
- Single person: use
result.landmarks[0].
Practical Patterns (Know-how)
- State machine: idle → loading (load model) → ready (can start) → active (webcam + detection) → error. When switching model variant, close the old PoseLandmarker instance and create a new one.
- Throttle: Run
detectForVideoonly whenperformance.now() - lastFrameTime > 33(≈30 fps) to avoid blocking the main thread. - Smoothing: Apply a smoothing factor (e.g. 0.3) to derived values (pitch, bank) to reduce jitter; use a dead zone (in degrees) to ignore small movements.
- Confidence: Use each landmark’s
visibility; ignore or downweight points below a threshold. Helper:getLandmark(landmarks, index, minConfidence)returning the point only ifvisibility >= minConfidence. - Gestures: e.g. “hands forward” = compare shoulder vs wrist z; “hands overhead” = compare wrist y to shoulder y. Use consecutive-frame counters for toggles (e.g. require N frames in pose before firing an action).
Cleanup
- Stop webcam:
stream.getTracks().forEach(t => t.stop()). - Release task:
poseLandmarker.close()when done or before creating a new instance.
Additional Resources
- For full 33 landmark indices and skeleton connections, see reference.md
- For minimal example, skeleton overlay, and landmark-to-control mapping, see examples.md
- Official docs: Pose Landmarker Web JS, Setup guide for web
More from liuchiawei/agent-skills
next-intl-app-router
Configures and uses next-intl for Next.js App Router with locale-based routing. Use when adding or changing i18n, locale routing, translations, next-intl plugin, middleware/proxy, or message files in Next.js App Router projects.
184frontend-developer
Build React components, implement responsive layouts, and handle
17web-design-guidelines
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
14frontend-patterns
Frontend development patterns for React, Next.js, state management, performance optimization, and UI best practices.
14ui-ux-pro-max
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
13vercel-composition-patterns
React composition patterns that scale. Use when refactoring components with
13