web-crawler

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [Indirect Prompt Injection] (LOW): The skill's primary function is to ingest untrusted HTML from external websites and transform it for LLM processing, creating a risk that malicious instructions embedded in web pages could influence the behavior of a consuming AI agent.\n
  • Ingestion points: The crawler fetches data from arbitrary external URLs using the reqwest crate, as seen in src/crawler/robots.rs and src/parser/sitemap.rs.\n
  • Boundary markers: The Markdown conversion in src/services/markdown.rs uses YAML frontmatter to separate metadata from content, but does not include explicit instructions or robust markers to prevent an LLM from following commands embedded within the page body.\n
  • Capability inventory: The tool has permission to write files to the local disk (std::fs::write in src/main.rs) and perform outbound network requests.\n
  • Sanitization: The tool uses the html2md crate and manual HTML escaping for reports (src/output/html.rs), which helps prevent cross-site scripting (XSS) in reports but does not filter for semantic prompt injection attacks.\n- [Data Exposure & Exfiltration] (SAFE): The tool uses the dirs crate to resolve the user's home directory for its default output path on the Desktop (src/lib.rs). This is expected behavior for a desktop CLI tool and does not involve unauthorized access to sensitive system files like credentials or private keys.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:11 PM