gsplat-optimizer
Gaussian Splat Optimizer
Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.
When to Use
- Optimizing
.plyor.splatfiles for mobile/Apple GPU targets - Reducing gaussian count for performance (pruning strategies)
- Implementing Level-of-Detail (LOD) for large scenes
- Compressing splat data for bandwidth/storage constraints
- Profiling and optimizing Metal rendering performance
- Targeting specific FPS goals on Apple hardware
Quick Start
Input: Provide a .ply/.splat file path, target device class, and FPS target.
# Analyze a splat file
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60
Output: The skill provides:
- Point/gaussian pruning plan (opacity, size, error thresholds)
- LOD scheme suggestion (distance bins, gaussian subsets)
- Compression recommendation (if bandwidth/storage bound)
- Metal profiling checklist with shader/compute tips
Optimization Workflow
Step 1: Analyze the Scene
First, understand your scene characteristics:
- Gaussian count: Total number of splats
- Opacity distribution: Histogram of opacity values
- Size distribution: Gaussian scale statistics
- Memory footprint: Estimated GPU memory usage
Step 2: Determine Target Device
| Device Class | GPU Budget | Max Gaussians (60fps) | Storage Mode |
|---|---|---|---|
| iPhone (A15+) | 4-6GB unified | ~2-4M | Shared |
| iPad Pro (M1+) | 8-16GB unified | ~6-8M | Shared |
| Mac (M1-M3) | 8-24GB unified | ~8-12M | Shared/Managed |
| Vision Pro | 16GB unified | ~4-6M (stereo) | Shared |
| Mac (discrete GPU) | 8-24GB VRAM | ~10-15M | Private |
Step 3: Apply Pruning
If gaussian count exceeds device budget:
- Opacity threshold: Remove gaussians with opacity < 0.01-0.05
- Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
- Importance pruning: Use LODGE algorithm for error-proxy selection
- Foveated rendering: For Vision Pro, reduce density in peripheral view
See references/pruning-strategies.md for details.
Step 4: Implement LOD (Large Scenes)
For scenes exceeding single-frame budget:
- Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
- Hierarchical structure: Octree or LoD tree for spatial queries
- Chunk streaming: Load/unload based on camera position
- Smooth transitions: Opacity blending at chunk boundaries
See references/lod-schemes.md for details.
Step 5: Apply Compression (If Needed)
For bandwidth/storage constraints:
| Method | Compression | Use Case |
|---|---|---|
| SOGS | 20x | Web delivery, moderate quality |
| SOG | 24x | Web delivery, better quality |
| CodecGS | 30x+ | Maximum compression |
| C3DGS | 31x | Fast rendering priority |
See references/compression.md for details.
Step 6: Profile and Optimize Metal
- Choose storage mode: Private for static data, Shared for dynamic
- Optimize shaders: Function constants, thread occupancy
- Profile with Xcode: GPU Frame Capture, Metal System Trace
- Iterate: Measure, optimize, repeat
See references/metal-profiling.md for details.
Common Pitfalls
1. Point Cloud Density Mismatch
Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.
- Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
- Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing
Debugging:
# Analyze gaussian distribution
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram
# Check against device budget
# Compare total_gaussians vs. device_max in the output table
Strategy:
- Start with device budget from Step 2 (e.g., 4M for iPhone)
- If scene exceeds budget by >20%, apply pruning before training
- If visual quality drops too much after pruning, consider LOD or chunking
- Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones
2. Training Instability (Gradient Explosions, Divergence)
Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:
- Loss suddenly jumps to NaN
- Gaussians disappear or explode in scale
- Model becomes unrecoverable mid-session
Debugging:
# Monitor loss during training
tail -f training.log | grep -E "loss|nan|inf"
# Check gradient magnitudes
python -c "
import numpy as np
from plyfile import PlyData
ply_data = PlyData.read('scene.ply')
scales = ply_data['vertex']['scale_0'].data
print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}')
print(f'Any NaN: {np.isnan(scales).any()}')
"
Strategy:
- Gradient clipping: Cap gradient updates to ±0.1 scale per step
- Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
- Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
- Checkpoint early: Save state every 10 iterations; rollback if loss spikes
- Freeze covariance: If converged, stop updating scale/rotation after 80% of training
- For device training: Reduce batch size or resolution if instability persists
3. Memory Limitations (OOM Errors on Large Scenes)
Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.
- iPhone: 4–6GB shared between app + GPU
- iPad Pro: 8–16GB shared
- Vision Pro: 16GB (but stereo doubles gaussian count)
Debugging:
# Estimate memory footprint
python << 'EOF'
num_gaussians = 5_000_000 # Your count
bytes_per_gaussian = 56 # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12)
total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2)
print(f"Est. memory: {total_mb:.1f} MB")
print(f"Safe for iPhone A15: {total_mb < 2000}") # Leave headroom for app
EOF
# Monitor live memory in Xcode
# Memory graph + Allocations instrument during scene load
Strategy:
- Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
- Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
- Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
- Lazy loading: Keep only active LOD level in memory; unload far chunks
- Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye
4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)
Problem: Optimizing heavily for one metric breaks another:
- Maximize FPS → visual artifacts: Over-pruning removes important geometry
- Maximize quality → frame drops: Too many gaussians for target device
- Minimize memory → banding/posterization: Excessive quantization or LOD culling
Debugging:
# Profile before/after each change
python << 'EOF'
metrics = {
"original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"},
"after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"},
}
for label, m in metrics.items():
print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}")
EOF
Strategy:
- Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
- Measure baseline: Profile original unoptimized scene first
- Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
- Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
- Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)
5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)
Problem: Rendering pipeline stalls despite low gaussian count:
- First frame (cold start): 2–5s delay while shaders compile
- Mid-scene: Frame drops spike when new LOD levels load
- Smooth playback → stuttering after 30–60s
Debugging:
# Capture Metal frame statistics
# In Xcode: Product > Scheme > Edit > Run > Diagnostics
# Enable: Metal API Validation, GPU Frame Capture
# Check shader compilation time
python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \
--capture-shader-compile \
--target iphone14
# Monitor frame time distribution
tail -f xcode.log | grep -E "frame_time|stutter"
Strategy:
- Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
- Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
- Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
- Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
- Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)
Key Metrics
| Metric | Target | How to Measure |
|---|---|---|
| Frame time | 16.6ms (60fps) | Metal System Trace |
| GPU memory | < device budget | Xcode Memory Graph |
| Bandwidth | < 50GB/s | GPU Counters |
| Shader time | < 10ms | GPU Frame Capture |
Reference Implementation
MetalSplatter is the primary reference for Swift/Metal gaussian splatting:
- Repository: https://github.com/scier/MetalSplatter
- Supports iOS, macOS, visionOS
- ~8M splat capacity with v1.1 optimizations
- Stereo rendering for Vision Pro
Getting Started with MetalSplatter
git clone https://github.com/scier/MetalSplatter.git
cd MetalSplatter
open SampleApp/MetalSplatter_SampleApp.xcodeproj
# Set to Release scheme for best performance
Resources
Reference Documentation
- Pruning Strategies - Gaussian reduction techniques
- LOD Schemes - Level-of-detail approaches
- Compression - Bandwidth/storage optimization
- Metal Profiling - Apple GPU optimization
Research Papers
- LODGE - LOD for large-scale scenes
- FLoD - Flexible LOD for variable hardware
- Voyager - City-scale mobile rendering
- 3DGS Compression Survey
Apple Developer Resources
More from ckorhonen/claude-skills
video-editor
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
63tui-designer
Design and implement retro/cyberpunk/hacker-style terminal UIs. Covers React (Tuimorphic), SwiftUI (Metal shaders), and CSS approaches. Use when creating terminal aesthetics, CRT effects, neon glow, scanlines, phosphor green displays, or retro-futuristic interfaces.
35practical-typography
Professional typography guidance based on Matthew Butterick's Practical Typography. Use when evaluating, critiquing, or improving document formatting, text layout, font choices, punctuation, spacing, or any typography-related decisions for print or web content.
34app-marketing-copy
Write marketing copy and App Store / Google Play listings (ASO keywords, titles, subtitles, short+long descriptions, feature bullets, release notes), plus screenshot caption sets and text-to-image prompt templates for generating store screenshot backgrounds/promo visuals. Use when asked to: write/refresh app marketing copy, craft app store metadata, brainstorm taglines/value props, produce ad/landing/email copy, or generate prompts for screenshot/creative generation.
33markdown-fetch
Fetch and extract web content as clean Markdown when provided with URLs. Use this skill whenever a user provides a URL (http/https link) that needs to be read, analyzed, summarized, or extracted. Converts web pages to Markdown with 80% fewer tokens than raw HTML. Handles all content types including JS-heavy sites, documentation, articles, and blog posts. Supports three conversion methods (auto, AI, browser rendering). Always use this instead of web_fetch when working with URLs - it's more efficient and provides cleaner output.
26llm-advisor
Consult other LLMs (GPT-4.1, o4-mini, Gemini 2.5 Pro, Claude Opus) for second opinions on complex bugs, hard problems, planning, and architecture decisions. Use proactively when stuck for 15+ minutes or facing complex debugging. Use when user says 'ask Gemini/GPT/Claude about X' or 'get a second opinion'.
22