spectacles-ai
Spectacles AI — Reference Guide
The primary bridge between a lens and external AI models on Spectacles is the Remote Service Gateway (RSG). RSG is Snap's managed, authenticated proxy layer that handles auth, rate limiting, and response streaming for any registered endpoint — not just AI endpoints.
Official docs: Spectacles Home · Features Overview (Camera Module for vision input)
Core Concepts
Remote Service Gateway (RSG)
- Available via
require('LensStudio:RemoteServiceModule') - Makes authenticated calls to registered cloud endpoints defined in your Lens Studio project's Remote Services panel.
- Returns results asynchronously; use callbacks or
async/await.
Key APIs on Spectacles
| Capability | API / Module | Notes |
|---|---|---|
| LLM inference | RSG → your backend or Snap RSG built-in endpoints | Stream tokens with onPartialResponse |
| Speech-to-Text | AsrModule |
40+ languages, accuracy modes, mixed-language |
| Text-to-Speech | TtsModule |
Synthesises audio from a string |
| Camera frame access | CameraModule |
Grab frames for vision model input |
| AI Vision / object detection | RSG + depth texture / camera frame | Encode frame → send to cloud vision API |
| Depth texture | DepthTextureProvider |
Access per-pixel depth for 3D projection |
Remote Service Gateway — Patterns
Declaring a remote service
In Lens Studio, open Remote Services (Project panel → Remote Services) and add an endpoint. Each service has an ID string you reference in script.
Making a basic call
const remoteServiceModule = require('LensStudio:RemoteServiceModule')
function callLLM(prompt: string, onResult: (text: string) => void): void {
const request = RemoteServiceHttpRequest.create()
request.endpoint = 'my-llm-service' // matches service ID in Remote Services panel
request.method = RemoteServiceHttpRequest.HttpRequestMethod.Post
request.body = JSON.stringify({ prompt })
remoteServiceModule.performHttpRequest(request, (response) => {
if (response.statusCode === 200) {
onResult(JSON.parse(response.body).text)
} else {
print('RSG error: ' + response.statusCode)
}
})
}
Streaming responses (token-by-token)
request.onPartialResponse = (partial: string) => {
// Update UI in real-time as tokens arrive
textComponent.text += partial
}
Speech-to-Text (ASR)
Full example with language and accuracy options
const asrModule = require('LensStudio:AsrModule')
@component
export class VoiceInput extends BaseScriptComponent {
onAwake(): void {
const options = AsrModule.Options.create()
// Language: BCP-47 tag, e.g. 'en-US', 'fr-FR', 'ja-JP'
// 40+ languages supported. Leave unset for auto-detect.
options.locale = 'en-US'
// Accuracy: Balanced (faster, lower power) or High (more accurate, slower)
options.accuracy = AsrModule.Accuracy.Balanced // or AsrModule.Accuracy.High
// Mixed language: allow transcription to switch languages mid-sentence
options.mixedLanguages = false
const session = asrModule.startSession(options)
session.onTranscriptUpdate.add((event: AsrModule.TranscriptUpdateEvent) => {
print('Transcript: ' + event.transcript)
if (event.isFinal) {
// Committed utterance — safe to send to LLM
print('Final transcript: ' + event.transcript)
print('Detected language: ' + event.language)
session.stop() // IMPORTANT: always stop to free the microphone
}
})
session.onError.add((error) => {
print('ASR error: ' + error)
session.stop() // also stop on error to free the microphone
})
}
}
isFinal = falsemeans a streaming partial result — the word list may still change.isFinal = truemeans the utterance is committed and ready for processing.- Always call
session.stop()when done — leaving ASR open drains battery and blocks the microphone.
Text-to-Speech (TTS)
const ttsModule = require('LensStudio:TtsModule')
function speak(text: string): void {
const options = TtsTextToSpeechOptions.create()
options.text = text
ttsModule.speak(options, (audioComponent) => {
audioComponent.play(1) // play once
})
}
Cleanup on playback finish
ttsModule.speak(options, (audioComponent) => {
audioComponent.onPlaybackFinished.add(() => {
print('TTS finished')
})
audioComponent.play(1)
})
- TTS audio competes with other audio sources — set appropriate
AudioMixerChannel. - Calling
speak()again before the previous audio finishes will queue the new utterance.
Camera Access for Vision
const cameraModule = require('LensStudio:CameraModule')
const request = CameraModule.createCameraRequest()
request.cameraId = CameraModule.CameraId.Default_Color
const cameraTexture = cameraModule.requestCamera(request)
// Encode for transmission
const encodeOptions = EncodeTextureOptions.create()
Base64.encodeTextureAsync(cameraTexture, encodeOptions, (base64: string) => {
callVisionAPI(base64)
})
Depth Cache pattern
Cache multiple depth frames for richer 3D analysis:
- Capture frames on a timer using
DelayedCallbackEvent - Store frames in a ring buffer (
const buffer: string[] = []) - On user interaction, select the most relevant cached frame and send to a cloud vision model
Agentic Loop Pattern
For autonomous multi-step AI interactions (as in the Agentic Playground sample):
- User speaks → ASR transcription
- Transcription sent to LLM via RSG
- LLM returns a tool call (e.g.,
{ "action": "show_object", "params": {} }) - Lens executes the action in the AR scene
- Lens sends a tool result back to the LLM for the next turn
- Loop continues until LLM returns a final response
Key design notes:
- Store conversation history in a typed array and include it in each request body.
- Use a
thinkingorloadingUI state while waiting for LLM responses. - Implement a hard iteration cap (e.g., max 10 tool calls) to avoid infinite loops.
interface Message { role: 'user' | 'assistant'; content: string }
const history: Message[] = []
const MAX_HISTORY = 20 // trim to avoid leaking earlier turns and hitting size limits
const MAX_TOOL_CALLS = 10 // hard cap to prevent runaway loops
let toolCallCount = 0
async function chat(userText: string): Promise<void> {
if (toolCallCount++ >= MAX_TOOL_CALLS) {
displayText('Reached iteration limit')
return
}
history.push({ role: 'user', content: userText })
// Keep only the last MAX_HISTORY messages
if (history.length > MAX_HISTORY) history.splice(0, history.length - MAX_HISTORY)
const request = RemoteServiceHttpRequest.create()
request.endpoint = 'my-llm'
request.method = RemoteServiceHttpRequest.HttpRequestMethod.Post
request.body = JSON.stringify({ messages: history })
remoteServiceModule.performHttpRequest(request, (response) => {
const reply = JSON.parse(response.body).content as string
history.push({ role: 'assistant', content: reply })
displayText(reply)
})
}
AI Music Generation (Lyria / Gemini)
Pattern from the AI Music Gen sample:
ASR → Gemini text → Lyria audio → AudioComponent → 3D mesh driven by FFT
- User describes genre/vibe via ASR
- Description sent to Gemini via RSG for tag extraction
- Tags forwarded to Lyria endpoint to generate an audio clip
- Clip played through
AudioComponent - 3D visualiser driven by
AudioSpectrumFFT data
Gesture + Vision: Crop Pattern
From the Crop sample — pinch to define a region, send to vision model:
- Track hand pinch position with SIK
HandInputData - Convert screen position to UV coordinates
- Crop the camera texture at those UVs
- Encode cropped region and send to RSG vision endpoint
- Display result in a world-space panel near the gesture origin
Permissions & Privacy
Combining camera, microphone, or location with internet connectivity triggers Snap's Transparent Permission system: the OS shows a consent dialog on launch and the device LED blinks during capture.
Important exception: Calls via the Remote Service Gateway (RSG) do not count as external connectivity. You can freely combine camera or microphone with RSG (LLMs, ASR, TTS, vision APIs) in a published lens without triggering the Transparent Permission prompt. This is the recommended pattern for AI-powered lenses.
Common Gotchas
- RSG is not available in the Lens Studio simulator — test AI features on-device.
- Large base64 payloads can hit RSG body-size limits; resize or downsample images before encoding.
- ASR leaves the microphone open until you call
session.stop()— always stop on bothisFinalandonErrorto protect privacy and battery. - ASR accuracy modes:
Balancedis faster;Highgives better results for commands with technical vocabulary. - TTS and game audio share a mixer — prioritise with
AudioMixerChannel. - Use
async/awaitto avoid callback pyramids in complex agentic loops. - Always handle
response.statusCode !== 200cases — network errors are common on Spectacles (the device moves around). - Agentic loops: always enforce a hard iteration cap in code (not just in comments) and trim conversation history to avoid leaking earlier sensitive user input.
Reference Examples
- APIKeyHint.ts - A standard pattern for providing API key context.
- ModelGenBridge.ts - Example usage of handling Remote Service module prompts.
More from rolandsmeenk/lensstudioagents
lens-studio-world-query
Reference guide for world understanding and scoring in Lens Studio — covering WorldQueryModule HitTestSession (HitTestSessionOptions.filter for jitter smoothing, semantic surface classification for floor/wall/ceiling/table detection, null result handling, per-frame performance), SIK InteractionManager targeting interactor ray pattern, Physics.createGlobalProbe().rayCast for scene-collider hits with collision layer filtering, aligning objects to surface normals using quat.lookAt, and the LeaderboardModule (create/retrieve with TTL and OrderingType, submitScore, getLeaderboardInfo with UsersType.Global/Friends). Use this skill when detecting real floors/walls/tables to place AR content, raycasting for hover or interaction against scene objects, or adding a global in-lens leaderboard — differentiates from spectacles-lens-essentials (physics/SIK) and from spectacles-cloud (Supabase persistence).
8spectacles-cloud
Reference guide for Snap Cloud (Supabase-powered backend) in Spectacles lenses — covering Fetch API setup (requires Internet Access capability in Project Settings), Postgres REST queries with the anon key, Row Level Security policies, Realtime WebSocket subscriptions with correct postgres_changes event format and reconnect-on-sleep patterns, cloud storage uploads of base64 images captured by Spectacles, serverless Edge Functions, and companion web dashboard architecture. Use this skill whenever a lens needs persistent cloud data, needs to share data with a web app in real time, uploads captured images to a bucket, or calls a cloud function — covering Snap Cloud and World Kindness Day samples. Use spectacles-networking for plain REST calls to non-Snap backends, and spectacles-connected-lenses for in-session multiplayer state.
7lens-studio-materials-shaders
Reference guide for materials and shaders in Lens Studio — covering runtime material property changes (clone-before-modify, mainPass.baseColor, mainPass.opacity, mainPass.baseTex), blend modes (Normal/Alpha/Add/Screen/Multiply), depth and cull settings (depthTest, depthWrite, twoSided, cullMode), render order, material variants, assigning textures and render targets, reading and writing RenderTarget textures for post-processing, the graph-based Material Editor node system, custom shader graph nodes, and common shader pitfalls. Use this skill for any lens that needs to change material colors or textures at runtime, implement custom visual effects with shaders, set up post-processing render pipelines, chain render targets, or debug material/blend-mode issues — covering MaterialEditor, Drawing, and HairSimulation examples.
6spectacles-connected-lenses
Reference guide for real-time multiplayer AR on Spectacles using Connected Lenses and Spectacles Sync Kit — covering session creation/joining with joinOrCreateSession (including 'already-in-session' error handling), TransformSyncComponent for position/rotation replication, RealtimeStore for shared key-value state (max 512 bytes per key), NetworkEventSystem for one-shot broadcast events, EntityOwnership for physics authority, Lens Cloud for persistent cross-session data, and patterns for turn-based (Tic Tac Toe) and real-time physics (Air Hockey). Also covers late-joiner state sync, transform drift mitigation, and store size limits. Use this skill whenever multiple Spectacles users need to share AR objects or state — covering Tic Tac Toe, Air Hockey, Laser Pointer, High Five, Shared Sync Controls, Spectacles Sync Kit, and Think Out Loud samples.
5lens-studio-user-context
Reference guide for Snapchat user data and social features in Lens Studio — covering UserContextSystem (display name, Bitmoji, profile picture), Bitmoji 2D (requestBitmoji2DResource + RemoteMediaModule fetch pattern), Bitmoji 3D (requestBitmoji3DResource, AnimationMixer for playback), Bitmoji Head with live facial animation, Friends API (FriendsComponent, FriendInfo, listing friends and their Bitmojis), Dynamic Response Poster/Responder mechanic (tappable areas, reading Poster data in the Responder flow with DynamicResponseComponent), and LeaderboardModule (Leaderboard Custom Component, Score Widget, submit/retrieve with OrderingType and UsersType). Use this skill whenever a lens needs the current user's name or avatar, accesses friends' Bitmoji or data, implements a send-and-respond mechanic, or adds a global score leaderboard.
5lens-studio-face-tracking
Reference guide for face and body AR tracking in Lens Studio — covering FaceTrackingComponent setup (multi-face, faceIndex), FaceInset and FaceMask effects, 2D and 3D Face Attachments (hat/mouth/left_eye/right_eye anchors), Face Mesh UV texturing, Face Landmarks (68 keypoints), Face Expression weights for mouth-open/eye-blink detection, Eye Tracking component (left/right eye direction), Upper Body Tracking 3D asset (hips/spine/shoulder attachment points), Upper Body Mesh for seamless selfie occlusion, and Face Retouch/Eye Color/Face Liquify/Face Stretch effects. Use this skill for any phone or front-camera lens involving faces, selfie effects, makeup, face masks, 3D head ornaments, body tracking, or expression-driven animations — covering the vast majority of Snapchat lens content.
4