The Agent Skills Directory

[Indirect Prompt Injection] (LOW): The skill's primary function is to ingest untrusted HTML from external websites and transform it for LLM processing, creating a risk that malicious instructions embedded in web pages could influence the behavior of a consuming AI agent.\n
Ingestion points: The crawler fetches data from arbitrary external URLs using the reqwest crate, as seen in src/crawler/robots.rs and src/parser/sitemap.rs.\n
Boundary markers: The Markdown conversion in src/services/markdown.rs uses YAML frontmatter to separate metadata from content, but does not include explicit instructions or robust markers to prevent an LLM from following commands embedded within the page body.\n
Capability inventory: The tool has permission to write files to the local disk (std::fs::write in src/main.rs) and perform outbound network requests.\n
Sanitization: The tool uses the html2md crate and manual HTML escaping for reports (src/output/html.rs), which helps prevent cross-site scripting (XSS) in reports but does not filter for semantic prompt injection attacks.\n- [Data Exposure & Exfiltration] (SAFE): The tool uses the dirs crate to resolve the user's home directory for its default output path on the Desktop (src/lib.rs). This is expected behavior for a desktop CLI tool and does not involve unauthorized access to sensitive system files like credentials or private keys.

web-crawler