crawl4ai
Warn
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: MEDIUMDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [DATA_EXFILTRATION]: The execution engine in scripts/engine.py implements a local fast-path for the file:// URI scheme. This allows the skill to bypass web crawling logic and directly read arbitrary files from the local filesystem of the host. This capability can be exploited to access sensitive information such as SSH keys, environment variables, or local credentials.
- Evidence: scripts/engine.py lines 189-201.
- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it incorporates untrusted data (webpage titles and headers) from external websites directly into prompts used for chunk planning.
- Ingestion points: External webpage metadata is extracted in scripts/engine.py and passed to the LLM planning logic in scripts/crawl_url.py.
- Boundary markers: Absent. The CHUNKING_PROMPT in scripts/crawl_url.py and CHUNKING_PLANNER_PROMPT in scripts/graph.py use basic string interpolation without secure delimiters or instructions to ignore embedded commands.
- Capability inventory: The skill possesses the ability to read local files, perform network requests, and execute commands via the Foundation isolation pattern.
- Sanitization: No validation, escaping, or filtering is performed on the extracted title or document skeleton before they are interpolated into the LLM prompt.
Audit Metadata