addon-llm-judge-evals
Pass
Audited by Gen Agent Trust Hub on Mar 3, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill outlines an 'LLM Judge' evaluation workflow which presents a surface for indirect prompt injection.\n
- Ingestion points: The process involves a JUDGE_MODEL evaluating 'generated outputs' as described in the Workflow section of SKILL.md.\n
- Boundary markers: There are no explicit instructions or delimiters in the rubric.md template to prevent the judge model from following instructions embedded within the data it is evaluating.\n
- Capability inventory: The skill is limited to file structure organization and documentation; the included bash commands in SKILL.md do not provide the judge model with access to destructive system tools.\n
- Sanitization: No evidence of input sanitization or filtering logic is specified for the input data before it is processed by the judge model.\n- [COMMAND_EXECUTION]: The skill contains a validation script using basic system commands.\n
- Evidence: A bash block in SKILL.md executes 'test -f' to confirm the existence of required project files like rubric.md and run_llm_judge.py. These operations are standard environment checks and do not perform any unsafe actions.
Audit Metadata