data-science-model-evaluation
Warn
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- REMOTE_CODE_EXECUTION (MEDIUM): In
references/streamlit-advanced.md, the skill demonstrates the use ofpickle.load()to load model files. Deserializing data from untrusted sources usingpickleis a well-known vulnerability that allows for arbitrary code execution. - COMMAND_EXECUTION (LOW): Multiple reference files (
notebook-testing.md,sharing-publishing.md,plotly-dash.md) provide examples of shell commands for the agent to execute, includingpytest,jupyter nbconvert,quarto render,voila, andgunicorn. While standard for developer workflows, these provide a surface for command injection if parameters are not handled carefully. - EXTERNAL_DOWNLOADS (LOW): The documentation includes instructions for installing various third-party packages and tools using
pip(e.g.,nbval,voila,ydata-profiling). - DATA_EXFILTRATION (LOW): The skill promotes the use of external tracking platforms like MLflow and Weights & Biases (
references/experiment-tracking.md). These tools are designed to send data (metrics, parameters, artifacts) to remote servers, which constitutes intentional but noteworthy data transmission. - INDIRECT_PROMPT_INJECTION (LOW):
- Ingestion points: The skill is designed to ingest and process external datasets (CSV, Parquet) and text data for feature engineering.
- Boundary markers: Code snippets do not include delimiters or instructions to ignore embedded commands in processed data.
- Capability inventory: Contains file read/write operations, network access via tracking APIs, and shell command execution.
- Sanitization: No data validation or sanitization logic is present in the provided examples.
Audit Metadata