skills/aradotso/trending-skills/lingbot-map-3d-reconstruction

lingbot-map-3d-reconstruction

Installation
SKILL.md

LingBot-Map 3D Reconstruction Skill

Skill by ara.so — Daily 2026 Skills collection.

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from streaming image or video data using a Geometric Context Transformer. It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames via paged KV cache attention.

What It Does

  • Streaming 3D reconstruction from image sequences or video
  • Feed-forward inference (no iterative optimization needed)
  • Outputs: point clouds with per-point confidence, camera poses, depth maps
  • Key features: anchor context, pose-reference window, trajectory memory for drift correction

Installation

# 1. Create environment
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

# 2. Install PyTorch (CUDA 12.8)
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

# 3. Install lingbot-map
git clone https://github.com/Robbyant/lingbot-map.git
cd lingbot-map
pip install -e .

# 4. Install FlashInfer for fast paged KV cache attention (recommended)
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

# 5. Optional: visualization support
pip install -e ".[vis]"

# 6. Optional: sky masking for outdoor scenes
pip install onnxruntime       # CPU
pip install onnxruntime-gpu   # GPU

Model Download

Models available on HuggingFace and ModelScope:

# Download via huggingface_hub
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="robbyant/lingbot-map",
    filename="checkpoint.pt"
)

Or manually download from:

  • HuggingFace: https://huggingface.co/robbyant/lingbot-map
  • ModelScope: https://www.modelscope.cn/models/Robbyant/lingbot-map

CLI Commands

Demo with Interactive 3D Viewer (browser at localhost:8080)

# From image folder
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/

# From video file
python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10

# Outdoor scene with sky masking
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky

# Example scenes included in repo
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder example/church --mask_sky

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder example/oxford --mask_sky

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder example/university4 --mask_sky

Long Sequence Handling

# Keyframe interval: store every Nth frame in KV cache (saves memory)
# Use when sequence > 320 frames
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --keyframe_interval 6

# Windowed mode: for very long sequences (>3000 frames)
python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10 \
    --mode windowed --window_size 64

Without FlashInfer (SDPA fallback)

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

Sky Masking with Custom Paths

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/

CLI Arguments Reference

Input

Argument Description
--model_path Path to model checkpoint (.pt file)
--image_folder Directory of input images
--video_path Input video file path
--fps Frames per second to sample from video

Inference Mode

Argument Default Description
--mode streaming streaming or windowed
--window_size 64 Window size for windowed mode
--keyframe_interval 1 Store every Nth frame in KV cache
--use_sdpa False Use PyTorch SDPA instead of FlashInfer

Sky Masking

Argument Description
--mask_sky Enable sky segmentation and masking
--sky_mask_dir Custom directory for cached sky masks
--sky_mask_visualization_dir Save side-by-side mask visualizations

Visualization

Argument Default Description
--port 8080 Viser viewer port
--conf_threshold 1.5 Filter low-confidence points
--point_size 0.00001 Point cloud point size
--downsample_factor 10 Spatial downsampling for display

Python API Usage

Basic Streaming Inference

import torch
from lingbot_map import LingBotMap  # adjust import to actual module structure

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
model = LingBotMap.from_pretrained("/path/to/checkpoint.pt")
model = model.to(device).eval()

# Streaming inference over image list
from pathlib import Path
from PIL import Image
import torchvision.transforms as T

transform = T.Compose([
    T.Resize((378, 518)),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225])
])

image_paths = sorted(Path("/path/to/images").glob("*.jpg"))

with torch.no_grad():
    for img_path in image_paths:
        img = Image.open(img_path).convert("RGB")
        frame = transform(img).unsqueeze(0).to(device)
        output = model.stream(frame)
        # output contains: pointmap, confidence, camera pose

Loading and Running the Demo Programmatically

# The demo.py script is the primary entry point
# Run it as a subprocess or study it for API patterns
import subprocess

result = subprocess.run([
    "python", "demo.py",
    "--model_path", "/path/to/checkpoint.pt",
    "--image_folder", "example/church",
    "--mask_sky",
    "--port", "8080"
], check=True)

Video Input Pattern

import cv2
import torch

# Extract frames from video for batch processing
def extract_frames(video_path: str, fps: int = 10):
    cap = cv2.VideoCapture(video_path)
    video_fps = cap.get(cv2.CAP_PROP_FPS)
    interval = max(1, int(video_fps / fps))
    
    frames = []
    frame_idx = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        if frame_idx % interval == 0:
            # Convert BGR to RGB
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame_rgb)
        frame_idx += 1
    
    cap.release()
    return frames

frames = extract_frames("video.mp4", fps=10)

Common Patterns

Pattern 1: Outdoor Scene Reconstruction

# Always use --mask_sky for outdoor scenes to remove noisy sky points
python demo.py \
    --model_path ./checkpoint.pt \
    --image_folder ./outdoor_images \
    --mask_sky \
    --conf_threshold 2.0 \
    --downsample_factor 5

Pattern 2: Long Indoor Sequence

# Use keyframe_interval to manage KV cache for sequences 320-3000 frames
python demo.py \
    --model_path ./checkpoint.pt \
    --image_folder ./long_sequence \
    --keyframe_interval 6 \
    --conf_threshold 1.5

Pattern 3: Very Long Video (>3000 frames)

# Use windowed mode for extremely long sequences
python demo.py \
    --model_path ./checkpoint.pt \
    --video_path long_video.mp4 \
    --fps 5 \
    --mode windowed \
    --window_size 64

Pattern 4: High Quality Dense Reconstruction

# Lower conf_threshold keeps more points, smaller downsample shows more detail
python demo.py \
    --model_path ./checkpoint.pt \
    --image_folder ./images \
    --conf_threshold 1.0 \
    --downsample_factor 1 \
    --point_size 0.00005

Pattern 5: CPU / No FlashInfer Fallback

# When FlashInfer is unavailable, use SDPA
python demo.py \
    --model_path ./checkpoint.pt \
    --image_folder ./images \
    --use_sdpa

Architecture Concepts

Component Role
Anchor Context Coordinate grounding to prevent drift
Pose-Reference Window Dense geometric cues from recent frames
Trajectory Memory Long-range drift correction across the sequence
Paged KV Cache Efficient attention over long streaming sequences

Troubleshooting

FlashInfer Not Available

# Error: FlashInfer not found
# Solution: Install or use SDPA fallback
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
# Or add --use_sdpa to any command
python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa

CUDA Out of Memory on Long Sequences

# Reduce memory with keyframe interval
python demo.py --model_path ./checkpoint.pt \
    --image_folder ./images --keyframe_interval 6

# Or switch to windowed mode
python demo.py --model_path ./checkpoint.pt \
    --image_folder ./images --mode windowed --window_size 32

Sky Mask Model Download Fails

# Manual download of skyseg.onnx
wget https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx
# Place in expected path or specify via --sky_mask_dir

Low Quality / Noisy Point Cloud

# Increase confidence threshold to filter noisy points
python demo.py --model_path ./checkpoint.pt \
    --image_folder ./images --conf_threshold 2.5

# For outdoor, always add sky masking
python demo.py --model_path ./checkpoint.pt \
    --image_folder ./images --mask_sky --conf_threshold 2.0

Port Already in Use

# Change the viewer port
python demo.py --model_path ./checkpoint.pt \
    --image_folder ./images --port 8090

Images Not Loading

# Ensure images are sorted and in supported formats (jpg, png)
ls /path/to/images | head -5
# Supported: .jpg, .jpeg, .png, .bmp, .webp

Performance Guidelines

Sequence Length Recommended Mode Notes
< 320 frames Default streaming Full KV cache
320–3000 frames --keyframe_interval 6 Reduces cache by 6x
> 3000 frames --mode windowed --window_size 64 Sliding window
  • Target resolution: 518×378 for ~20 FPS throughput
  • GPU: CUDA-capable GPU required for practical speeds
  • Model size: ~4.63 GB checkpoint

Citation

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}
Weekly Installs
176
GitHub Stars
39
First Seen
Today