content-extractor

Pass

Audited by Gen Agent Trust Hub on Mar 8, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The content_extractor.py file uses subprocess.run to execute the curl binary for fetching web pages and downloading media. While these are core functionalities, executing system commands using external inputs like URLs requires careful handling to prevent command injection. Additionally, the code contains hardcoded absolute paths (e.g., /Users/delta/...) that expose the developer's local directory structure.
  • [EXTERNAL_DOWNLOADS]: The skill is designed to automate the download of media files and metadata from numerous third-party social media domains such as douyin.com, bilibili.com, and xiaoyuzhoufm.com based on URLs provided by the user.
  • [PROMPT_INJECTION]: The skill represents a surface for indirect prompt injection as it scrapes content from external websites and returns it to the agent without sanitization.
  • Ingestion points: Content is ingested from multiple platforms through the _extract_* methods in content_extractor.py.
  • Boundary markers: The extracted content is returned as raw strings without delimiters or instructions to the agent to treat it as untrusted data.
  • Capability inventory: The skill has the ability to execute system commands via curl, perform network requests, and write files to the local disk.
  • Sanitization: While filenames are sanitized to prevent illegal characters, the actual text content extracted from web pages is not filtered or sanitized before being passed back to the agent.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 8, 2026, 02:25 AM