llm-as-a-judge
Pass
Audited by Gen Agent Trust Hub on Feb 19, 2026
Risk Level: SAFE
Full Analysis
- Indirect Prompt Injection (SAFE): The skill is designed to ingest and process untrusted LLM outputs as its primary function, which creates a potential surface for indirect prompt injection. This risk is inherent to the use-case.
- Ingestion points: The
prompt-template.mdfile defines placeholders such as{{GENERATED_EMAIL_HERE}}and{{TUTOR_EXPLANATION_HERE}}for external content. - Boundary markers: The templates use clear markdown delimiters (---) and structured JSON formatting (reasoning before answer) which helps constrain model behavior.
- Capability inventory: The skill focus is on qualitative assessment; the judge LLM is not provided with tools to execute commands or access sensitive files based on the input.
- Sanitization: While no automated sanitization is present, the skill emphasizes human-labeled validation (TPR/TNR metrics) to ensure judge reliability against adversarial or edge-case inputs.
- External Reference (SAFE): The skill mentions the
numpylibrary and an external GitHub repository for educational and implementation support. There is no evidence of automated package installation or remote code execution.
Audit Metadata