mixseek-evaluator-config

Warn

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (LOW): The skill triggers the execution of a local Python validation script (validate-config.py) using the uv tool. While this is an operational step for configuration integrity, it represents host-level command execution.
  • PROMPT_INJECTION (MEDIUM): An indirect prompt injection surface (Category 8) is identified within the LLMPlain metric configuration. 1. Ingestion points: User requirements for evaluation metrics (Step 1). 2. Boundary markers: Absent; user input is directly interpolated into the system_instruction field. 3. Capability inventory: Writing configuration files (evaluator.toml, judgment.toml) and executing validation scripts. 4. Sanitization: Absent; no filtering or sanitization of the instructional text is mentioned. This allows potentially malicious instructions to influence the behavior of the downstream LLM agent that consumes these configuration files.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 16, 2026, 08:53 AM