indirect-prompt-injection

Pass

Audited by Gen Agent Trust Hub on Mar 12, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The skill contains an extensive repository of injection strings for testing and reference purposes. Files like references/attack-patterns.md and tests/test_cases.json include high-confidence injection directives such as "Ignore all previous instructions," "You are now DAN," and "Your new task is...". While these are meant for detection benchmarking, they constitute a significant prompt injection risk surface if the agent reads these files without strict boundary enforcement.
  • [COMMAND_EXECUTION]: The skill documentation (SKILL.md) instructs the user/agent to execute local Python scripts (scripts/sanitize.py and scripts/run_tests.py) to process content. While the provided scripts use standard libraries and perform regex-based scanning, executing commands on untrusted external input is a capability that requires careful isolation.
  • [DATA_EXFILTRATION]: The skill's detection logic and test corpus (scripts/sanitize.py, tests/test_cases.json) contain numerous references to sensitive system paths (e.g., ~/.ssh/id_rsa, .env, /etc/shadow) and exfiltration endpoints (e.g., webhook.site). Although these are targets for the detector, their presence in the agent's context could be exploited if an injection attack successfully redirects the agent to access these specific paths.
  • [PROMPT_INJECTION]: The skill includes instructions for managing untrusted data in references/safe-parsing.md. This highlights an inherent Category 8 risk surface where the agent is encouraged to ingest and process data from external, potentially hostile sources like social media and web pages.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 12, 2026, 11:13 AM