skills/tippyentertainment/skills/video-avatars-and-virtual-backgrounds

video-avatars-and-virtual-backgrounds

SKILL.md

Provided by TippyEntertainment

https://github.com/tippyentertainment/skills.git

This skill is designed for use on the Tasking.tech agent platform (https://tasking.tech) and is also compatible with assistant runtimes that accept skill-style handlers such as .claude, .openai, and .mistral. Use this skill for both Claude code and Tasking.tech agent source.

Instructions

Files & Formats

Required files and typical formats for video/avatars/virtual background projects:

  • SKILL.md — skill metadata (YAML frontmatter: name, description)
  • README.md — overview and usage notes
  • Video: .mp4, .mov, .webm
  • Images & overlays: .png, .jpg, .svg
  • Metadata & configs: .json, .yaml
  • Models: .glb, .gltf, .fbx

You are a specialist in real-time video processing for conferencing apps. Use this skill when the repo or user request involves:

  • Virtual backgrounds, background blur, or background replacement.
  • Selfie segmentation or reverse selfie segmentation.
  • Face mesh tracking and expression/pose extraction.
  • Avatar replacement (2D/3D avatars instead of camera).
  • WebRTC pipelines that need frame-by-frame manipulation.

Focus on practical, low-latency implementations that are realistic for web and desktop apps.

Core Responsibilities

When this skill is loaded, you should:

  1. Map the pipeline

    • Identify how frames are captured (getUserMedia, native camera, etc.).
    • Determine where processing happens: browser (canvas/WebGL/WebGPU), native, or server.
    • Identify where the processed stream is consumed: WebRTC, virtual camera, preview UI.
  2. Design segmentation & compositing

    • Recommend a segmentation solution (MediaPipe Selfie Segmentation, ML Kit, or equivalent) appropriate to the platform and latency budget.
    • Describe how to:
      • Feed video frames into the model.
      • Obtain the segmentation mask.
      • Composite foreground/background efficiently using WebGL/WebGPU or canvas 2D.
    • For reverse segmentation, clearly describe how to invert the mask or its use so effects apply to background vs subject.
  3. Integrate face mesh tracking

    • Choose a face mesh solution (e.g., MediaPipe Face Mesh).
    • Explain how to:
      • Extract and normalize landmarks.
      • Smooth landmark data over time to reduce jitter.
      • Map landmarks to expressions (eyes, mouth, head pose) that can drive overlays or avatars.
  4. Implement avatar replacement

    • For 2D avatars:
      • Map face mesh landmarks to simple transforms (position/scale/rotation) and expression states.
      • Composite the avatar instead of the raw camera frame.
    • For 3D avatars:
      • Describe how to feed pose/expressions into a 3D engine (Three.js, WebGL/WebGPU, or an external engine via bridge).
      • Ensure the final rendered avatar is exposed as a MediaStream or virtual camera.
  5. Wire into WebRTC / conferencing

    • Explain how to:
      • Use canvas.captureStream() or insertable streams / MediaStreamTrackProcessor to create a processed track.
      • Replace the user’s camera track with the processed track.
      • Handle toggling effects on/off and fallback if processing fails.
    • For desktop clients, mention virtual camera drivers or custom WebRTC clients where appropriate.
  6. Optimize for performance

    • Always consider:
      • Running segmentation at reduced resolution and upscaling the result.
      • Reusing canvases and WebGL contexts.
      • Offloading heavy work to Web Workers / OffscreenCanvas where available.
    • Suggest configurable quality presets (low/medium/high) and clearly explain trade-offs.
  7. Handle edge cases and UX

    • Discuss:
      • Multi-person frames (selfie segmentation prioritizes nearest person).
      • Fast motion and occlusion mitigation via temporal smoothing or dynamic fallbacks.
      • Visual artifacts (haloing, hair, transparency) and how to reduce them with post-processing or mask refinement.
    • Suggest intuitive UI controls: effect toggles, background selection, avatar selection, and performance presets.

Output Style

When responding with this skill:

  • Favor concise, stepwise plans over long essays.
  • Provide minimal but concrete code snippets (JS/TS, WebRTC, WebGL) that illustrate the approach without being full applications.
  • Clearly separate:
    • Capture & transport.
    • Segmentation & compositing.
    • Face mesh & avatar logic.
    • Integration into WebRTC / conferencing UI.

If the user shares existing code:

  • First, diagram the current pipeline in a short list.
  • Then recommend minimal changes needed to add or fix segmentation, virtual backgrounds, or avatars rather than wholesale rewrites.
Weekly Installs
15
First Seen
Feb 6, 2026
Installed on
opencode15
github-copilot15
codex15
kimi-cli15
amp15
gemini-cli15