LangSmith Evaluators

Fail

Audited by Snyk on Mar 2, 2026

Risk Level: HIGH
Full Analysis

HIGH W007: Insecure credential handling detected in skill instructions.

  • Insecure credential handling detected (high risk: 1.00). The prompt instructs evaluators and judge prompts to embed run_outputs / example_outputs verbatim into LLM prompts and returned comments (and to print raw outputs), so any secrets present in those outputs would be passed to and could be emitted by the LLM—an explicit high-risk pattern.

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.70). This skill's workflow (SKILL.md "LLM as Judge" example and the "Uploading Evaluators" / auto-run sections) reads dataset examples and run outputs from LangSmith (user-provided/untrusted content) and injects those outputs directly into LLM prompts and evaluator logic that run automatically on datasets, so third‑party content can materially influence scoring/behavior.
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 2, 2026, 10:47 AM