skills/openclaw/skills/llm-evaluator/Gen Agent Trust Hub

llm-evaluator

Warn

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • Indirect Prompt Injection (MEDIUM): The skill processes 'trace' data containing untrusted external content for evaluation. This creates an attack surface where malicious payloads in the traces could influence the LLM judge's scores or behavior. \n
  • Ingestion points: scripts/evaluator.py (via trace scoring commands).\n
  • Boundary markers: Not visible in provided files.\n
  • Capability inventory: Network access to Langfuse and OpenRouter APIs; execution of local Python scripts.\n
  • Sanitization: Not visible in provided files.\n- Metadata Poisoning (MEDIUM): The skill description claims to use a non-existent 'GPT-5-nano' model. This is deceptive metadata that misleads users regarding the skill's actual capabilities and technical foundation.\n- Command Execution (LOW): The skill executes local Python scripts using python3 {baseDir}/scripts/evaluator.py. While standard for its stated purpose, it relies on the integrity of the external script which is not provided in the analyzed context.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 16, 2026, 05:10 AM