academic-web-scraping
Pass
Audited by Gen Agent Trust Hub on Mar 31, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill facilitates data collection from well-known and reputable academic APIs, including OpenAlex, Crossref, and PubMed (NCBI). These operations are essential for its primary function as a research tool.
- [COMMAND_EXECUTION]: The provided Python code snippets use standard libraries such as
requests,BeautifulSoup, andPlaywrightto automate data extraction. The scripts include best practices for ethical scraping, such as identifying the user-agent and implementing delays to respect server limits. - [SAFE]: API keys are handled securely by demonstrating the use of environment variables (
os.environ.get) rather than hardcoding sensitive credentials. - [SAFE]: The skill inherently processes untrusted data from external websites, which is a theoretical surface for indirect prompt injection. However, this is documented and implemented within the context of a developer guide with no signs of malicious intent.
- Ingestion points: Web scraping and API fetching logic within
SKILL.md(e.g.,scrape_conference_proceedings). - Boundary markers: Not present, as the code focuses on raw data extraction for research storage.
- Capability inventory: Network access (
requests,playwright) and local file system write access for data archival (DataCollector). - Sanitization: The skill uses standard HTML parsing via BeautifulSoup but does not perform security-specific sanitization against prompt injection, which is expected for a data collection guide.
Audit Metadata