ai-evals
Pass
Audited by Gen Agent Trust Hub on Apr 16, 2026
Risk Level: SAFE
Full Analysis
- [PROMPT_INJECTION]: The files
eval/with_skill.mdandwithout_skill.mdcontain several examples of prompt injection and jailbreak attempts, such as 'DAN' role-play, system prompt extraction, and 'ignore previous instructions' markers. These are used strictly as data within an evaluation dataset ('golden set') meant to test a separate system-under-test. The skill correctly identifies these as adversarial risks and specifies that the expected behavior is for the AI to refuse such attempts. - [DATA_EXFILTRATION]: No network-enabled tools or commands (e.g., curl, wget, fetch) are utilized. The skill operates entirely within the chat or local file system context provided by the user.
- [CREDENTIALS_UNSAFE]: No hardcoded API keys, tokens, or passwords were found. The skill instructions in
SKILL.mdexplicitly advise users to redact or anonymize sensitive data before generating datasets. - [INDIRECT_PROMPT_INJECTION]: The skill's workflow involves analyzing user-provided logs and failure modes. This creates a surface where the agent might process untrusted data containing malicious instructions. The skill mitigates this by utilizing structured templates (JSONL) for test cases and instructing the user to apply least-privilege principles to the data ingestion process.
- [EXTERNAL_DOWNLOADS]: The skill does not perform any external downloads or install third-party dependencies during execution.
Audit Metadata