databricks-synthetic-data-generation

Pass

Audited by Gen Agent Trust Hub on Feb 27, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill instructs the agent to write Python code to local files and execute them on a Databricks cluster using the run_python_file_on_databricks tool. This core functionality is used for managing Spark sessions, creating infrastructure, and processing data.
  • [EXTERNAL_DOWNLOADS]: The skill recommends installing the faker and holidays libraries from the Python Package Index (PyPI). These are well-known and trusted packages used to generate realistic synthetic data components like names and dates.
  • [PROMPT_INJECTION]: The skill exposes an indirect prompt injection surface (Category 8). Evidence: 1. Ingestion points: User-provided schema and catalog names. 2. Boundary markers: Absent. 3. Capability inventory: Execution of arbitrary SQL via spark.sql(), file writing to Volumes, and package installation. 4. Sanitization: Absent. The logic interpolates user input directly into SQL strings (e.g., spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.{SCHEMA}")), which could be exploited by a malicious user to execute unauthorized SQL operations.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 27, 2026, 07:55 PM