The Agent Skills Directory

[PROMPT_INJECTION]: The skill facilitates the ingestion of untrusted external data such as images, PDFs, and audio files, which is a common vector for indirect prompt injection.
Ingestion points: Technical guides in rules/vision-image-analysis.md, rules/vision-document.md, and rules/audio-speech-to-text.md demonstrate reading and processing user-provided media.
Boundary markers: The provided prompt templates do not include delimiters or specific instructions (e.g., 'ignore embedded commands') to prevent the model from obeying instructions hidden within the media content.
Capability inventory: The integration patterns utilize high-privilege SDKs (Anthropic, OpenAI, Google) and network tools, increasing the impact of a successful injection.
Sanitization: No sanitization, validation, or escaping logic is recommended for the content extracted from processed media before it is used in downstream prompts.

multimodal-llm