mlflow-evaluation

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTIONNO_CODE
Full Analysis
  • Indirect Prompt Injection (HIGH): The skill defines workflows for building datasets from production traces and executing evaluations. This creates a high-severity surface for indirect prompt injection where malicious instructions embedded in traces could influence the agent's evaluation logic or code execution. [1] Ingestion points: Production traces referenced in 'Workflow 2' and 'patterns-trace-analysis.md'. [2] Boundary markers: No delimiter or boundary instructions are provided to distinguish data from instructions. [3] Capability inventory: Instructions for writing and running 'mlflow.genai.evaluate()' code and creating '@scorer' functions. [4] Sanitization: No mention of validation, filtering, or escaping for external trace content.
  • Dynamic Execution (MEDIUM): The workflow encourages the runtime creation of scorer functions and execution of evaluation scripts, which constitutes dynamic code execution based on potentially attacker-controlled trace data.
  • No Code (INFO): This skill consists entirely of Markdown documentation and does not include any scripts, executables, or network operations directly.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 04:19 PM