llm-as-a-judge

Pass

Audited by Gen Agent Trust Hub on Feb 19, 2026

Risk Level: SAFE
Full Analysis
  • Indirect Prompt Injection (SAFE): The skill is designed to ingest and process untrusted LLM outputs as its primary function, which creates a potential surface for indirect prompt injection. This risk is inherent to the use-case.
  • Ingestion points: The prompt-template.md file defines placeholders such as {{GENERATED_EMAIL_HERE}} and {{TUTOR_EXPLANATION_HERE}} for external content.
  • Boundary markers: The templates use clear markdown delimiters (---) and structured JSON formatting (reasoning before answer) which helps constrain model behavior.
  • Capability inventory: The skill focus is on qualitative assessment; the judge LLM is not provided with tools to execute commands or access sensitive files based on the input.
  • Sanitization: While no automated sanitization is present, the skill emphasizes human-labeled validation (TPR/TNR metrics) to ensure judge reliability against adversarial or edge-case inputs.
  • External Reference (SAFE): The skill mentions the numpy library and an external GitHub repository for educational and implementation support. There is no evidence of automated package installation or remote code execution.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 19, 2026, 09:36 AM