benchmark-datasets
Pass
Audited by Gen Agent Trust Hub on Mar 18, 2026
Risk Level: SAFEPROMPT_INJECTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [PROMPT_INJECTION]: The skill facilitates the processing of adversarial prompts and jailbreak attempts (e.g., 'DAN', roleplay) for model evaluation, creating an indirect prompt injection surface.
- Ingestion points: Various benchmark datasets like HarmBench and JailbreakBench referenced in the SKILL.md.
- Boundary markers: No explicit delimiters or instructions to ignore embedded commands are shown in the illustrative code.
- Capability inventory: Interaction with models via
model.generateand similar generation functions. - Sanitization: The provided examples do not include content sanitization or validation of the benchmark data.
- [REMOTE_CODE_EXECUTION]: The skill references established Python packages
autoattackandtextattackfor evaluating model robustness against adversarial attacks. - [DATA_EXFILTRATION]: The skill includes links to official repositories and websites for AI safety research, such as harmbench.org and robustbench.github.io.
Audit Metadata