The Agent Skills Directory

[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection (Category 8) where malicious instructions embedded in scraped web pages could influence the behavior of the entity extraction stage.
Ingestion points: Data enters the system via fetch_static and fetch_with_playwright functions in SKILL.md which retrieve content from arbitrary external URLs.
Boundary markers: The prompt template in extract_entities_llm (Stage 5) directly interpolates untrusted content ({text_sample}) into the system prompt without using secure delimiters (e.g., XML tags or unique random separators) or instructions to ignore embedded commands.
Capability inventory: The skill possesses filesystem and network permissions as defined in claw.json. It can write files to the local disk (save_incremental) and make network requests to external APIs (OpenRouter).
Sanitization: While the skill performs text normalization (removing control characters and collapsing whitespace), it does not implement specific sanitization or filtering to detect or neutralize prompt injection attempts within the scraped text.
[EXTERNAL_DOWNLOADS]: The skill requires several external Python libraries and browser binaries to function.
Evidence: SKILL.md and the planning protocol reference pip install for packages like trafilatura, playwright, and scrapy, and npx playwright install for Chromium binaries. These are standard dependencies for the skill's stated purpose.
[COMMAND_EXECUTION]: The planning protocol instructs the agent to execute shell commands to inspect the environment.
Evidence: Protocol Step 2 specifies running pip list | grep ... and npx playwright install --dry-run to verify available tools and space.

web-scraper