web-archive-scraper

Pass

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it ingests and processes content from external websites archived by the Wayback Machine. This content is attacker-controlled and could contain malicious instructions meant to override agent behavior when the extracted text is processed.
  • Ingestion points: The scripts/search_archive.py script fetches raw HTML content from web.archive.org via the fetch_archived_content function.
  • Boundary markers: The script does not implement delimiters or provide warnings to the agent to ignore instructions embedded within the fetched content.
  • Capability inventory: The skill uses the requests library to perform network operations and the re module to extract text from HTML.
  • Sanitization: While the script strips HTML tags (such as <script> and <style>) to extract readable text, it does not sanitize or filter the resulting plain text for natural language instructions intended to manipulate the agent.
  • [EXTERNAL_DOWNLOADS]: The script performs network requests to the Wayback Machine CDX API (web.archive.org/cdx/search/cdx) and fetches archived snapshots from web.archive.org/web. This is a well-known service used for legitimate web research.
  • [COMMAND_EXECUTION]: The skill provides a Python-based command-line interface (scripts/search_archive.py) that handles arguments for searching, filtering, and fetching web archives.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 29, 2026, 05:17 PM