mlflow-evaluation
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTIONNO_CODE
Full Analysis
- Indirect Prompt Injection (HIGH): The skill defines workflows for building datasets from production traces and executing evaluations. This creates a high-severity surface for indirect prompt injection where malicious instructions embedded in traces could influence the agent's evaluation logic or code execution. [1] Ingestion points: Production traces referenced in 'Workflow 2' and 'patterns-trace-analysis.md'. [2] Boundary markers: No delimiter or boundary instructions are provided to distinguish data from instructions. [3] Capability inventory: Instructions for writing and running 'mlflow.genai.evaluate()' code and creating '@scorer' functions. [4] Sanitization: No mention of validation, filtering, or escaping for external trace content.
- Dynamic Execution (MEDIUM): The workflow encourages the runtime creation of scorer functions and execution of evaluation scripts, which constitutes dynamic code execution based on potentially attacker-controlled trace data.
- No Code (INFO): This skill consists entirely of Markdown documentation and does not include any scripts, executables, or network operations directly.
Recommendations
- AI detected serious security threats
Audit Metadata