crawl4ai
Pass
Audited by Gen Agent Trust Hub on Mar 29, 2026
Risk Level: SAFEDATA_EXFILTRATIONPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: Indirect prompt injection surface identified in the LLM-based chunk planning workflow.
- Ingestion points: External web content retrieved from the user-provided URL in 'scripts/engine.py' is parsed into a skeleton and passed to an LLM in 'scripts/crawl_url.py'.
- Boundary markers: Structural (headers); no explicit 'ignore embedded instructions' delimiters are used when passing the document skeleton to the LLM planner.
- Capability inventory: The skill uses the 'run_skill_command' API to execute subprocesses in an isolated environment.
- Sanitization: No sanitization is performed on headers or content retrieved from the web before being processed by the LLM.
- [DATA_EXFILTRATION]: Local file access capability via the 'file://' protocol support in 'scripts/engine.py'.
- The '_try_local_file_fast_path' function allows the crawler to read local files if provided with a 'file://' URL. While this is a documented feature intended for local document processing and testing, it represents a potential data exposure surface if an agent is coerced into accessing sensitive system files.
- [COMMAND_EXECUTION]: Employs the 'Foundation Isolation Pattern' to execute crawling tasks.
- The 'crawl_url' command in 'scripts/crawl_url.py' uses 'run_skill_command' to invoke 'scripts/engine.py' within a separate 'uv' environment. This provides a layer of isolation for heavy dependencies like Playwright and crawl4ai.
- [SAFE]: Dynamic module loading in 'scripts/crawl_url.py' is used for internal architectural purposes.
- The use of 'importlib.import_module' in the '_resolve_engine_helpers' function is limited to loading the local 'engine.py' module within the skill's own package, which is a standard Python practice.
Audit Metadata