The Agent Skills Directory

COMMAND_EXECUTION (LOW): The skill triggers the execution of a local Python validation script (validate-config.py) using the uv tool. While this is an operational step for configuration integrity, it represents host-level command execution.
PROMPT_INJECTION (MEDIUM): An indirect prompt injection surface (Category 8) is identified within the LLMPlain metric configuration. 1. Ingestion points: User requirements for evaluation metrics (Step 1). 2. Boundary markers: Absent; user input is directly interpolated into the system_instruction field. 3. Capability inventory: Writing configuration files (evaluator.toml, judgment.toml) and executing validation scripts. 4. Sanitization: Absent; no filtering or sanitization of the instructional text is mentioned. This allows potentially malicious instructions to influence the behavior of the downstream LLM agent that consumes these configuration files.

mixseek-evaluator-config