The Agent Skills Directory

[PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it ingests and processes content from external websites archived by the Wayback Machine. This content is attacker-controlled and could contain malicious instructions meant to override agent behavior when the extracted text is processed.
Ingestion points: The scripts/search_archive.py script fetches raw HTML content from web.archive.org via the fetch_archived_content function.
Boundary markers: The script does not implement delimiters or provide warnings to the agent to ignore instructions embedded within the fetched content.
Capability inventory: The skill uses the requests library to perform network operations and the re module to extract text from HTML.
Sanitization: While the script strips HTML tags (such as <script> and <style>) to extract readable text, it does not sanitize or filter the resulting plain text for natural language instructions intended to manipulate the agent.
[EXTERNAL_DOWNLOADS]: The script performs network requests to the Wayback Machine CDX API (web.archive.org/cdx/search/cdx) and fetches archived snapshots from web.archive.org/web. This is a well-known service used for legitimate web research.
[COMMAND_EXECUTION]: The skill provides a Python-based command-line interface (scripts/search_archive.py) that handles arguments for searching, filtering, and fetching web archives.

web-archive-scraper