The Agent Skills Directory

PROMPT_INJECTION (LOW): The skill is susceptible to indirect prompt injection. It transcribes user audio into text and appends it to the conversation history without any sanitization or protective delimiters.
Ingestion points: The process_voice_input method in examples/voice_assistant.py takes an audio_file and converts it to text which is then used in the conversation pipeline.
Boundary markers: No markers or system instructions are used to separate transcribed user content from agent instructions in VoiceAssistant.generate_response.
Capability inventory: The skill provides access to local audio hardware (microphone and speakers) via pyaudio and interfaces with multiple cloud AI providers.
Sanitization: The transcription text is used raw as it comes from the STT provider.
DATA_EXFILTRATION (LOW): The skill makes network requests to external API endpoints (AssemblyAI and Eleven Labs) to process audio data.
Evidence: requests.post calls to api.assemblyai.com in examples/speech_recognition_providers.py and api.elevenlabs.io in examples/text_to_speech_providers.py transmit audio data to third-party servers.

voice-ai-integration