The Agent Skills Directory

[PROMPT_INJECTION]: Indirect prompt injection surface identified in custom scorer implementations where untrusted agent outputs are evaluated by a 'judge' LLM.
Ingestion points: The inputs and outputs dictionaries in cost_accuracy_judge (references/custom-scorer-patterns.md) and the _extract_response_text helper (scripts/evaluation_helpers.py) ingest data directly from agent execution results.
Boundary markers: Absent. The example evaluation prompts in references/custom-scorer-patterns.md use Python f-strings to embed the query and response_text variables directly into the judge's instructions without delimiters (e.g., XML tags or triple backticks) or instructions to ignore embedded commands.
Capability inventory: The _call_llm_for_scoring function (scripts/evaluation_helpers.py) uses the Databricks SDK (WorkspaceClient) to perform network requests to model serving endpoints (w.serving_endpoints.query).
Sanitization: Absent. The skill does not perform escaping, filtering, or validation on the extracted text before passing it to the scoring LLM.

mlflow-genai-evaluation