eval-designer

Pass

Audited by Gen Agent Trust Hub on Apr 24, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill provides instructions and a framework for building LLM evaluation suites. All identified patterns are contextualized within safety testing guidelines.
  • [PROMPT_INJECTION]: The skill includes examples of prompt injection attacks, such as "Ignore your instructions" and "Ignore previous instructions", as adversarial test cases for safety evaluation. These patterns are documented as inputs to be used for testing the target system and do not attempt to override the agent's own behavior.
  • [COMMAND_EXECUTION]: Provides Python code snippets demonstrating the use of the subprocess module to execute security tools like bandit for static analysis of generated code. This is presented as a best practice for automated safety evaluation.
  • [EXTERNAL_DOWNLOADS]: Recommends the use of well-known third-party tools such as promptfoo and langsmith for building evaluation pipelines.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 24, 2026, 10:24 PM