The Agent Skills Directory

[PROMPT_INJECTION]: The skill is designed to ingest untrusted data from external websites and APIs and process it through an LLM. This creates a surface for indirect prompt injection where malicious instructions hidden in scraped content (e.g., a website title) could influence the agent's scoring or summarization behavior.
Ingestion points: Data is fetched from arbitrary URLs in scraper/sources/*.py using requests or playwright.
Boundary markers: The skill uses basic JSON formatting instructions in ai/pipeline.py to structure the prompt, but lacks robust mechanisms to prevent the LLM from obeying instructions embedded within the scraped data strings.
Capability inventory: The agent can write data to external storage providers like Notion, Sheets, or Supabase via their respective APIs.
Sanitization: The _build_prompt function in ai/pipeline.py provides no sanitization or escaping of the scraped text before interpolating it into the prompt.
[EXTERNAL_DOWNLOADS]: The skill relies on standard third-party libraries for its operations, including requests, beautifulsoup4, playwright, and notion-client. These are well-known, reputable packages used for web scraping and API integrations.

data-scraper-agent