skill-test

Pass

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]:
  • The framework is designed to execute Python, SQL, and YAML code blocks extracted from AI agent responses on Databricks clusters and warehouses to verify their functionality.
  • This logic is implemented via MCP tools such as mcp__databricks__execute_databricks_command and mcp__databricks__execute_sql in src/skill_test/grp/executor.py.
  • Utility scripts like scripts/add_example.py use system commands (pbpaste, xclip) to interact with the system clipboard for developer convenience.
  • [EXTERNAL_DOWNLOADS]:
  • The optimization and evaluation components (optimize.py, mlflow_eval.py) communicate with external LLM provider endpoints, including Databricks Model Serving, OpenAI, and Anthropic.
  • It also interacts with MLflow tracking servers to log and retrieve evaluation metrics and session traces.
  • [PROMPT_INJECTION]:
  • The skill exhibits an indirect prompt injection surface because it ingests responses generated by other skills and executes the embedded code blocks during verification.
  • Ingestion points: The interactive function in src/skill_test/cli/commands.py receives a response string containing markdown code blocks.
  • Boundary markers: Relies on standard markdown triple-backtick delimiters (e.g., ```python) to identify code segments.
  • Capability inventory: Includes high-privilege operations such as arbitrary code execution on compute resources and file uploads to Unity Catalog volumes.
  • Sanitization: Performs basic syntax validation but does not sanitize the logical content of the generated code before execution.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 10, 2026, 09:20 AM