evaluation-harness

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [Indirect Prompt Injection] (LOW): The score_with_llm function in SKILL.md implements an 'LLM-as-a-judge' pattern that is susceptible to indirect prompt injection. If the model output being evaluated contains instructions to ignore the grading rubric, the judge LLM may be influenced.
  • Ingestion points: Untrusted data enters the agent context through the actual variable (output from the model being tested) and the dataset_path file read in the EvaluationHarness class.
  • Boundary markers: Absent. The prompt template in score_with_llm interpolates {actual} and {expected} directly into the instruction string without using XML tags, triple backticks, or other delimiters to isolate the untrusted content.
  • Capability inventory: The provided scripts are limited to data processing and scoring; they do not contain subprocess calls, network operations, or file-writing capabilities.
  • Sanitization: Absent. No escaping or validation is performed on the output being graded before it is sent to the LLM judge.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 05:56 PM