databricks-synthetic-data-generation
Pass
Audited by Gen Agent Trust Hub on Feb 27, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill instructs the agent to write Python code to local files and execute them on a Databricks cluster using the
run_python_file_on_databrickstool. This core functionality is used for managing Spark sessions, creating infrastructure, and processing data. - [EXTERNAL_DOWNLOADS]: The skill recommends installing the
fakerandholidayslibraries from the Python Package Index (PyPI). These are well-known and trusted packages used to generate realistic synthetic data components like names and dates. - [PROMPT_INJECTION]: The skill exposes an indirect prompt injection surface (Category 8). Evidence: 1. Ingestion points: User-provided schema and catalog names. 2. Boundary markers: Absent. 3. Capability inventory: Execution of arbitrary SQL via
spark.sql(), file writing to Volumes, and package installation. 4. Sanitization: Absent. The logic interpolates user input directly into SQL strings (e.g.,spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.{SCHEMA}")), which could be exploited by a malicious user to execute unauthorized SQL operations.
Audit Metadata