academic-web-scraping

Pass

Audited by Gen Agent Trust Hub on Mar 31, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill facilitates data collection from well-known and reputable academic APIs, including OpenAlex, Crossref, and PubMed (NCBI). These operations are essential for its primary function as a research tool.
  • [COMMAND_EXECUTION]: The provided Python code snippets use standard libraries such as requests, BeautifulSoup, and Playwright to automate data extraction. The scripts include best practices for ethical scraping, such as identifying the user-agent and implementing delays to respect server limits.
  • [SAFE]: API keys are handled securely by demonstrating the use of environment variables (os.environ.get) rather than hardcoding sensitive credentials.
  • [SAFE]: The skill inherently processes untrusted data from external websites, which is a theoretical surface for indirect prompt injection. However, this is documented and implemented within the context of a developer guide with no signs of malicious intent.
  • Ingestion points: Web scraping and API fetching logic within SKILL.md (e.g., scrape_conference_proceedings).
  • Boundary markers: Not present, as the code focuses on raw data extraction for research storage.
  • Capability inventory: Network access (requests, playwright) and local file system write access for data archival (DataCollector).
  • Sanitization: The skill uses standard HTML parsing via BeautifulSoup but does not perform security-specific sanitization against prompt injection, which is expected for a data collection guide.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 31, 2026, 10:16 PM