web-archive-scraper
Pass
Audited by Gen Agent Trust Hub on Mar 29, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it ingests and processes content from external websites archived by the Wayback Machine. This content is attacker-controlled and could contain malicious instructions meant to override agent behavior when the extracted text is processed.
- Ingestion points: The
scripts/search_archive.pyscript fetches raw HTML content fromweb.archive.orgvia thefetch_archived_contentfunction. - Boundary markers: The script does not implement delimiters or provide warnings to the agent to ignore instructions embedded within the fetched content.
- Capability inventory: The skill uses the
requestslibrary to perform network operations and theremodule to extract text from HTML. - Sanitization: While the script strips HTML tags (such as
<script>and<style>) to extract readable text, it does not sanitize or filter the resulting plain text for natural language instructions intended to manipulate the agent. - [EXTERNAL_DOWNLOADS]: The script performs network requests to the Wayback Machine CDX API (
web.archive.org/cdx/search/cdx) and fetches archived snapshots fromweb.archive.org/web. This is a well-known service used for legitimate web research. - [COMMAND_EXECUTION]: The skill provides a Python-based command-line interface (
scripts/search_archive.py) that handles arguments for searching, filtering, and fetching web archives.
Audit Metadata