data-scraper-agent
Pass
Audited by Gen Agent Trust Hub on Mar 16, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill is designed to ingest untrusted data from external websites and APIs and process it through an LLM. This creates a surface for indirect prompt injection where malicious instructions hidden in scraped content (e.g., a website title) could influence the agent's scoring or summarization behavior.
- Ingestion points: Data is fetched from arbitrary URLs in
scraper/sources/*.pyusingrequestsorplaywright. - Boundary markers: The skill uses basic JSON formatting instructions in
ai/pipeline.pyto structure the prompt, but lacks robust mechanisms to prevent the LLM from obeying instructions embedded within the scraped data strings. - Capability inventory: The agent can write data to external storage providers like Notion, Sheets, or Supabase via their respective APIs.
- Sanitization: The
_build_promptfunction inai/pipeline.pyprovides no sanitization or escaping of the scraped text before interpolating it into the prompt. - [EXTERNAL_DOWNLOADS]: The skill relies on standard third-party libraries for its operations, including
requests,beautifulsoup4,playwright, andnotion-client. These are well-known, reputable packages used for web scraping and API integrations.
Audit Metadata