The Agent Skills Directory

Dynamic Execution (LOW): The skill dynamically loads the host's extension API using import() on computed paths.
Evidence: src/core-bridge.ts resolves the OpenClaw root directory and imports dist/extensionAPI.js at runtime. scripts/smoke-test.mjs performs a similar operation.
Mitigation: The skill performs integrity checks by reading the target directory's package.json and verifying the name is 'openclaw' before importing. Paths are normalized using path.resolve() to prevent traversal.
Context: This is a required architectural pattern for OpenClaw plugins; severity is downgraded as it is central to the skill's primary purpose.
Indirect Prompt Injection (LOW): The skill ingests untrusted audio data from Discord users which is then transcribed and processed by an AI agent.
Ingestion points: Voice audio is captured in real-time and converted to text via src/streaming-tts.ts or src/stt.ts (not fully provided, but referenced).
Boundary markers: The skill uses an extraSystemPrompt to instruct the agent on response constraints, but explicit delimiters for the transcribed user text are not visible in the bridge logic.
Capability inventory: The agent can use the discord_voice tool to join/leave channels and play audio.
Sanitization: Discord userId is validated against a snowflake regex, and administrative configuration is used for sensitive parameters.
External Downloads (LOW): The skill makes outbound network requests to third-party STT/TTS providers.
Evidence: src/streaming-tts.ts sends audio data and API keys to api.openai.com and api.elevenlabs.io.
Context: These are standard operations for a voice-enabled AI skill.
Command Execution (LOW): The skill relies on system-level tools like ffmpeg for audio transcoding.
Evidence: SKILL.md and clawdbot.plugin.json declare ffmpeg as a system dependency, which is typically invoked via libraries like prism-media for audio stream processing.

discord-voice