NYC

voice-ai-integration

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • PROMPT_INJECTION (LOW): The skill is susceptible to indirect prompt injection. It transcribes user audio into text and appends it to the conversation history without any sanitization or protective delimiters.
  • Ingestion points: The process_voice_input method in examples/voice_assistant.py takes an audio_file and converts it to text which is then used in the conversation pipeline.
  • Boundary markers: No markers or system instructions are used to separate transcribed user content from agent instructions in VoiceAssistant.generate_response.
  • Capability inventory: The skill provides access to local audio hardware (microphone and speakers) via pyaudio and interfaces with multiple cloud AI providers.
  • Sanitization: The transcription text is used raw as it comes from the STT provider.
  • DATA_EXFILTRATION (LOW): The skill makes network requests to external API endpoints (AssemblyAI and Eleven Labs) to process audio data.
  • Evidence: requests.post calls to api.assemblyai.com in examples/speech_recognition_providers.py and api.elevenlabs.io in examples/text_to_speech_providers.py transmit audio data to third-party servers.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 05:34 PM