data-science-model-evaluation

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • REMOTE_CODE_EXECUTION (MEDIUM): In references/streamlit-advanced.md, the skill demonstrates the use of pickle.load() to load model files. Deserializing data from untrusted sources using pickle is a well-known vulnerability that allows for arbitrary code execution.
  • COMMAND_EXECUTION (LOW): Multiple reference files (notebook-testing.md, sharing-publishing.md, plotly-dash.md) provide examples of shell commands for the agent to execute, including pytest, jupyter nbconvert, quarto render, voila, and gunicorn. While standard for developer workflows, these provide a surface for command injection if parameters are not handled carefully.
  • EXTERNAL_DOWNLOADS (LOW): The documentation includes instructions for installing various third-party packages and tools using pip (e.g., nbval, voila, ydata-profiling).
  • DATA_EXFILTRATION (LOW): The skill promotes the use of external tracking platforms like MLflow and Weights & Biases (references/experiment-tracking.md). These tools are designed to send data (metrics, parameters, artifacts) to remote servers, which constitutes intentional but noteworthy data transmission.
  • INDIRECT_PROMPT_INJECTION (LOW):
  • Ingestion points: The skill is designed to ingest and process external datasets (CSV, Parquet) and text data for feature engineering.
  • Boundary markers: Code snippets do not include delimiters or instructions to ignore embedded commands in processed data.
  • Capability inventory: Contains file read/write operations, network access via tracking APIs, and shell command execution.
  • Sanitization: No data validation or sanitization logic is present in the provided examples.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 06:33 PM