genie-benchmark-evaluator

Warn

Audited by Snyk on Mar 8, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.90). The skill ingests and acts on LLM-generated, untrusted content: it calls Genie via run_genie_query (w.genie.start_conversation) to obtain generated SQL and invokes LLM judges via _call_llm_for_scoring (w.serving_endpoints.query/serving endpoints) — the arbiter and other scorers parse those responses and can trigger material actions such as auto-updating benchmark YAMLs, so external model outputs can indirectly inject instructions that affect tool behavior.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 8, 2026, 02:33 AM