dataset-discovery
Pass
Audited by Gen Agent Trust Hub on Apr 19, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it retrieves and processes metadata, descriptions, and README files from external, potentially user-controlled platforms (HuggingFace, OpenML, GitHub, and Semantic Scholar). This content is presented to the agent, which could allow an attacker to embed instructions in dataset metadata designed to influence agent behavior.
- Ingestion points: Dataset search results and detailed metadata fetched via external APIs in
scripts/search_ml_datasets.py. - Boundary markers: The data is formatted into structured markdown tables, but there are no explicit instructions for the agent to treat the content as untrusted or to ignore embedded commands.
- Capability inventory: The agent has access to tools like
run_terminalandwrite_file, which could be targeted by malicious instructions found in external data. - Sanitization: Dataset descriptions are truncated to 200 characters, but README files and full metadata objects are ingested with fewer constraints.
- [COMMAND_EXECUTION]: The included script
scripts/search_ml_datasets.pyexecutes the GitHub CLI tool (gh) usingsubprocess.run. While the implementation correctly uses a list of arguments to prevent shell injection, it relies on the presence of external system binaries. - [EXTERNAL_DOWNLOADS]: The skill performs network operations to fetch metadata from well-known services (huggingface.co, openml.org, semanticscholar.org). It also lists the third-party
requestslibrary as a required Python dependency.
Audit Metadata