web-crawler
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [Indirect Prompt Injection] (LOW): The skill's primary function is to ingest untrusted HTML from external websites and transform it for LLM processing, creating a risk that malicious instructions embedded in web pages could influence the behavior of a consuming AI agent.\n
- Ingestion points: The crawler fetches data from arbitrary external URLs using the
reqwestcrate, as seen insrc/crawler/robots.rsandsrc/parser/sitemap.rs.\n - Boundary markers: The Markdown conversion in
src/services/markdown.rsuses YAML frontmatter to separate metadata from content, but does not include explicit instructions or robust markers to prevent an LLM from following commands embedded within the page body.\n - Capability inventory: The tool has permission to write files to the local disk (
std::fs::writeinsrc/main.rs) and perform outbound network requests.\n - Sanitization: The tool uses the
html2mdcrate and manual HTML escaping for reports (src/output/html.rs), which helps prevent cross-site scripting (XSS) in reports but does not filter for semantic prompt injection attacks.\n- [Data Exposure & Exfiltration] (SAFE): The tool uses thedirscrate to resolve the user's home directory for its default output path on the Desktop (src/lib.rs). This is expected behavior for a desktop CLI tool and does not involve unauthorized access to sensitive system files like credentials or private keys.
Audit Metadata