The Agent Skills Directory

[PROMPT_INJECTION]: The skill facilitates the processing of adversarial prompts and jailbreak attempts (e.g., 'DAN', roleplay) for model evaluation, creating an indirect prompt injection surface.
Ingestion points: Various benchmark datasets like HarmBench and JailbreakBench referenced in the SKILL.md.
Boundary markers: No explicit delimiters or instructions to ignore embedded commands are shown in the illustrative code.
Capability inventory: Interaction with models via model.generate and similar generation functions.
Sanitization: The provided examples do not include content sanitization or validation of the benchmark data.
[REMOTE_CODE_EXECUTION]: The skill references established Python packages autoattack and textattack for evaluating model robustness against adversarial attacks.
[DATA_EXFILTRATION]: The skill includes links to official repositories and websites for AI safety research, such as harmbench.org and robustbench.github.io.

benchmark-datasets