The Agent Skills Directory

[PROMPT_INJECTION]: The skill processes untrusted documents (PDF, DOCX, etc.) and web content (HTML, YouTube), creating an indirect prompt injection surface. Malicious instructions embedded in these files could influence downstream LLM processing. \n
Ingestion points: The batch conversion script in 'scripts/batch_convert.py' and URL-based conversion in 'references/web_content.md'. \n
Boundary markers: The skill does not explicitly add delimiters or warnings to its Markdown output to signal the presence of untrusted content. \n
Capability inventory: The 'markitdown' library can trigger LLM API calls for image descriptions, which might be manipulated by malicious inputs. \n
Sanitization: HTML processing includes stripping scripts and styles, though the resulting text content is preserved without further sanitization.\n- [EXTERNAL_DOWNLOADS]: The documentation instructs users to install various Python packages and system dependencies (like Tesseract) for full functionality. It also fetches content from remote URLs such as YouTube and RSS feeds, which are well-known services.\n- [COMMAND_EXECUTION]: The provided batch script ('scripts/batch_convert.py') performs file system operations including reading files and creating directories to store results. The skill also features a plugin system that allows users to register custom code for processing specific formats, though this is disabled by default.

markitdown