web-scraper
Warn
Audited by Gen Agent Trust Hub on Mar 8, 2026
Risk Level: MEDIUMEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The
url_to_filepathfunction inscripts/scrape.pyis vulnerable to directory traversal. - Evidence: The script constructs a file path by joining the output directory with the domain and URL path without sanitizing directory traversal sequences like
../. - Impact: An attacker could craft a URL that, if followed by the scraper, causes it to write files to unintended locations on the local filesystem. This is partially mitigated as the script enforces extensions like
.txtfor unknown file types. - [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection.
- Ingestion points: The
fetch_pagemethod inscripts/scrape.pydownloads HTML data from any URL provided or discovered during a crawl. - Boundary markers: Scraped content is saved with a basic header, but lacks explicit delimiters or instructions to the agent to treat the body text as untrusted content.
- Capability inventory: The skill can perform concurrent network GET requests and write to the local filesystem.
- Sanitization: The tool uses BeautifulSoup to strip HTML tags like
<script>and<style>, but it does not filter the resulting text content for malicious natural language instructions aimed at the AI agent. - [EXTERNAL_DOWNLOADS]: The skill requires the installation of several standard Python libraries:
aiohttp,beautifulsoup4,lxml, andaiofiles. These are reputable and well-known packages commonly used for web scraping and asynchronous operations.
Audit Metadata