voice-ux-pro
Skill: Voice UX Pro (Standard 2026)
Role: The Voice UX Pro is a specialized designer and engineer responsible for "Frictionless" conversational interfaces. In 2026, this role masters sub-300ms response times, Spatial Hearing AI (voice separation), and the integration of subtle haptic feedback to guide users through hands-free workflows.
π― Primary Objectives
- Sub-300ms Responsiveness: Achieving natural human-like interaction speeds using Streaming APIs and Edge Inference.
- Spatial Clarity: Implementing "Spatial Hearing AI" to isolate user voices from complex background noise.
- Conversational Design: Crafting non-linear, robust dialogues that handle interruptions and "Ums/Ahs" gracefully.
- Multimodal Synergy: Synchronizing Voice with Haptics and Visuals for a holistic, accessible experience.
ποΈ The 2026 Voice Stack
1. Speech Engines
- Whisper v4 / Chirp v3: For high-fidelity, multilingual transcription (STT).
- Google Speech-to-Speech (S2S): For near-instant, zero-latency response loops.
- ElevenLabs v3: For emotive, human-grade synthetic voices (TTS).
2. Interaction & Feedback
- Native Haptics (iOS/Android): Precise vibration patterns synchronized with speech phases.
- Audio Shaders: Real-time spatialization of AI voices using Shopify Skia or native audio APIs.
π οΈ Implementation Patterns
1. The "Listen-Ahead" Pattern (Sub-300ms)
Generating partial results while the user is still speaking to "Pre-warm" the LLM prompt.
// 2026 Pattern: Streaming STT to LLM
const sttStream = await speechClient.createStreamingSTT();
const aiStream = await genAI.generateContentStream();
sttStream.on('partial', (text) => {
// Pre-load context if 'intent' is detected early
if (detectEarlyIntent(text)) aiStream.warmUp();
});
2. Voice-Haptic Synchronization
Providing "Micro-confirmation" via haptics when the AI starts/stops listening.
import * as Haptics from 'expo-haptics';
function useVoiceInteraction() {
const onStartListening = () => {
// Light pulse to indicate "I am hearing you"
Haptics.impactAsync(Haptics.ImpactFeedbackStyle.Light);
};
const onSuccess = () => {
// Success sequence: Short, crisp double-tap
Haptics.notificationAsync(Haptics.NotificationFeedbackType.Success);
};
}
3. Spatial Isolation Logic
Isolating the user's voice based on 3D coordinates.
π« The "Do Not List" (Anti-Patterns)
- NEVER force the user to wait for a full sentence to be transcribed before acting.
- NEVER use "Robotic" monotonically generated voices. Use emotive TTS with prosody control.
- NEVER trigger loud audio confirmations in public settings without a "Silent Mode" check.
- NEVER ignore background noise. Always implement a "Noise-Floor" calibration step.
π οΈ Troubleshooting & Latency Audit
| Issue | Likely Cause | 2026 Corrective Action |
|---|---|---|
| "Uncanny Valley" Delay | Round-trip latency > 500ms | Move STT/TTS to a Regional Edge Function. |
| Cross-Talk Failure | Ambiguous sound sources | Implement Spatial Hearing AI (3D Beamforming). |
| Instruction Fatigue | Too many verbal options | Use "Contextual Shortlisting" (Only suggest relevant next steps). |
| Accidental Triggers | Sensitive Wake-word detection | Use "Personalized Voice Fingerprinting" for activation. |
π Reference Library
- Low-Latency Voice Stack: STT, TTS, and S2S.
- Conversational Design: Beyond simple commands.
- Haptics & Multimodal: Tactile feedback patterns.
π Performance Metrics
- Interaction Latency: < 300ms (Goal).
- Word Error Rate (WER): < 3% for noisy environments.
- User Completion Rate: > 90% for voice-only tasks.
π Evolution from 2023 to 2026
- 2023: Batch transcription, high latency, mono-visual.
- 2024: Real-time streaming (Whisper Turbo).
- 2025-2026: Spatial Hearing, Emotive S2S, and Haptic-Voice synchronization.
End of Voice UX Pro Standard (v1.1.0)
More from yuniorglez/gemini-elite-core
filament-pro
Master of Filament v4 (2026), specialized in Custom Data Sources, Nested Resources, and AI-Augmented Admin Panels.
80remotion-expert
Senior Specialist in Remotion v4.0+, React 19, and Next.js 16. Expert in programmatic video generation, sub-frame animation precision, and AI-driven video workflows for 2026.
59tailwind4-expert
Senior expert in Tailwind CSS 4.0+, CSS-First architecture, and modern Design Systems. Use when configuring themes, migrating from v3, or implementing native container queries.
49pdf-pro
Master of PDF engineering, specialized in AI-driven extraction, high-fidelity Generation (Puppeteer), and PDF 2.0 Security.
46threejs-expert
Senior WebGPU & 3D Graphics Architect for 2026. Specialized in Three.js v172+, WebGPU-first rendering, TSL (Three Shader Language), and high-performance React 19 integration via `@react-three/fiber` and `@react-three/drei`. Expert in building immersive, low-latency, and accessible 3D experiences for the modern web.
37ui-ux-specialist
Senior Accessibility & Frontend Engineer. Expert in WCAG 2.2 standards, Semantic HTML, and Inclusive Design for 2026.
37