ai-evals

Pass

Audited by Gen Agent Trust Hub on Apr 16, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: The files eval/with_skill.md and without_skill.md contain several examples of prompt injection and jailbreak attempts, such as 'DAN' role-play, system prompt extraction, and 'ignore previous instructions' markers. These are used strictly as data within an evaluation dataset ('golden set') meant to test a separate system-under-test. The skill correctly identifies these as adversarial risks and specifies that the expected behavior is for the AI to refuse such attempts.
  • [DATA_EXFILTRATION]: No network-enabled tools or commands (e.g., curl, wget, fetch) are utilized. The skill operates entirely within the chat or local file system context provided by the user.
  • [CREDENTIALS_UNSAFE]: No hardcoded API keys, tokens, or passwords were found. The skill instructions in SKILL.md explicitly advise users to redact or anonymize sensitive data before generating datasets.
  • [INDIRECT_PROMPT_INJECTION]: The skill's workflow involves analyzing user-provided logs and failure modes. This creates a surface where the agent might process untrusted data containing malicious instructions. The skill mitigates this by utilizing structured templates (JSONL) for test cases and instructing the user to apply least-privilege principles to the data ingestion process.
  • [EXTERNAL_DOWNLOADS]: The skill does not perform any external downloads or install third-party dependencies during execution.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 16, 2026, 09:44 AM