data-scraper-agent

Pass

Audited by Gen Agent Trust Hub on Mar 16, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill is designed to ingest untrusted data from external websites and APIs and process it through an LLM. This creates a surface for indirect prompt injection where malicious instructions hidden in scraped content (e.g., a website title) could influence the agent's scoring or summarization behavior.
  • Ingestion points: Data is fetched from arbitrary URLs in scraper/sources/*.py using requests or playwright.
  • Boundary markers: The skill uses basic JSON formatting instructions in ai/pipeline.py to structure the prompt, but lacks robust mechanisms to prevent the LLM from obeying instructions embedded within the scraped data strings.
  • Capability inventory: The agent can write data to external storage providers like Notion, Sheets, or Supabase via their respective APIs.
  • Sanitization: The _build_prompt function in ai/pipeline.py provides no sanitization or escaping of the scraped text before interpolating it into the prompt.
  • [EXTERNAL_DOWNLOADS]: The skill relies on standard third-party libraries for its operations, including requests, beautifulsoup4, playwright, and notion-client. These are well-known, reputable packages used for web scraping and API integrations.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 16, 2026, 08:43 PM