web-scraper
Pass
Audited by Gen Agent Trust Hub on Mar 9, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection (Category 8) where malicious instructions embedded in scraped web pages could influence the behavior of the entity extraction stage.
- Ingestion points: Data enters the system via
fetch_staticandfetch_with_playwrightfunctions inSKILL.mdwhich retrieve content from arbitrary external URLs. - Boundary markers: The prompt template in
extract_entities_llm(Stage 5) directly interpolates untrusted content ({text_sample}) into the system prompt without using secure delimiters (e.g., XML tags or unique random separators) or instructions to ignore embedded commands. - Capability inventory: The skill possesses
filesystemandnetworkpermissions as defined inclaw.json. It can write files to the local disk (save_incremental) and make network requests to external APIs (OpenRouter). - Sanitization: While the skill performs text normalization (removing control characters and collapsing whitespace), it does not implement specific sanitization or filtering to detect or neutralize prompt injection attempts within the scraped text.
- [EXTERNAL_DOWNLOADS]: The skill requires several external Python libraries and browser binaries to function.
- Evidence:
SKILL.mdand the planning protocol referencepip installfor packages liketrafilatura,playwright, andscrapy, andnpx playwright installfor Chromium binaries. These are standard dependencies for the skill's stated purpose. - [COMMAND_EXECUTION]: The planning protocol instructs the agent to execute shell commands to inspect the environment.
- Evidence: Protocol Step 2 specifies running
pip list | grep ...andnpx playwright install --dry-runto verify available tools and space.
Audit Metadata