The Agent Skills Directory

[COMMAND_EXECUTION]: The url_to_filepath function in scripts/scrape.py is vulnerable to directory traversal.
Evidence: The script constructs a file path by joining the output directory with the domain and URL path without sanitizing directory traversal sequences like ../.
Impact: An attacker could craft a URL that, if followed by the scraper, causes it to write files to unintended locations on the local filesystem. This is partially mitigated as the script enforces extensions like .txt for unknown file types.
[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection.
Ingestion points: The fetch_page method in scripts/scrape.py downloads HTML data from any URL provided or discovered during a crawl.
Boundary markers: Scraped content is saved with a basic header, but lacks explicit delimiters or instructions to the agent to treat the body text as untrusted content.
Capability inventory: The skill can perform concurrent network GET requests and write to the local filesystem.
Sanitization: The tool uses BeautifulSoup to strip HTML tags like <script> and <style>, but it does not filter the resulting text content for malicious natural language instructions aimed at the AI agent.
[EXTERNAL_DOWNLOADS]: The skill requires the installation of several standard Python libraries: aiohttp, beautifulsoup4, lxml, and aiofiles. These are reputable and well-known packages commonly used for web scraping and asynchronous operations.

web-scraper