content-extract

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [Indirect Prompt Injection] (HIGH): The skill's primary function is to ingest untrusted data from external URLs and convert it into Markdown for the agent to process.
  • Ingestion points: The --url argument in scripts/content_extract.py and subsequent fetching of web content.
  • Boundary markers: Absent. The extracted content is returned as a raw string in a JSON field without delimiters or instructions to the agent to treat it as untrusted data.
  • Capability inventory: The agent uses this skill to inform its reasoning and subsequent actions based on the content of external websites.
  • Sanitization: No sanitization or filtering is performed on the extracted Markdown content to prevent embedded instructions from being interpreted by the LLM.
  • [Dynamic Execution] (MEDIUM): The script scripts/content_extract.py dynamically determines the path of the executable script it runs.
  • Evidence: The _find_mineru_wrapper function checks the MINERU_WRAPPER_PATH environment variable. If an attacker can influence the environment, they can redirect the subprocess call to an arbitrary executable.
  • [Command Execution] (LOW): The skill uses subprocess.run to execute local scripts.
  • Evidence: subprocess.run(cmd, ...) is used in scripts/content_extract.py. While it uses a list format (preventing shell injection), it relies on the presence and integrity of a sibling skill (mineru-extract).
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 07:05 AM