The Agent Skills Directory

[PROMPT_INJECTION]: The files eval/with_skill.md and without_skill.md contain several examples of prompt injection and jailbreak attempts, such as 'DAN' role-play, system prompt extraction, and 'ignore previous instructions' markers. These are used strictly as data within an evaluation dataset ('golden set') meant to test a separate system-under-test. The skill correctly identifies these as adversarial risks and specifies that the expected behavior is for the AI to refuse such attempts.
[DATA_EXFILTRATION]: No network-enabled tools or commands (e.g., curl, wget, fetch) are utilized. The skill operates entirely within the chat or local file system context provided by the user.
[CREDENTIALS_UNSAFE]: No hardcoded API keys, tokens, or passwords were found. The skill instructions in SKILL.md explicitly advise users to redact or anonymize sensitive data before generating datasets.
[INDIRECT_PROMPT_INJECTION]: The skill's workflow involves analyzing user-provided logs and failure modes. This creates a surface where the agent might process untrusted data containing malicious instructions. The skill mitigates this by utilizing structured templates (JSONL) for test cases and instructing the user to apply least-privilege principles to the data ingestion process.
[EXTERNAL_DOWNLOADS]: The skill does not perform any external downloads or install third-party dependencies during execution.

ai-evals