indirect-prompt-injection
Pass
Audited by Gen Agent Trust Hub on Mar 12, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [PROMPT_INJECTION]: The skill contains an extensive repository of injection strings for testing and reference purposes. Files like
references/attack-patterns.mdandtests/test_cases.jsoninclude high-confidence injection directives such as "Ignore all previous instructions," "You are now DAN," and "Your new task is...". While these are meant for detection benchmarking, they constitute a significant prompt injection risk surface if the agent reads these files without strict boundary enforcement. - [COMMAND_EXECUTION]: The skill documentation (
SKILL.md) instructs the user/agent to execute local Python scripts (scripts/sanitize.pyandscripts/run_tests.py) to process content. While the provided scripts use standard libraries and perform regex-based scanning, executing commands on untrusted external input is a capability that requires careful isolation. - [DATA_EXFILTRATION]: The skill's detection logic and test corpus (
scripts/sanitize.py,tests/test_cases.json) contain numerous references to sensitive system paths (e.g.,~/.ssh/id_rsa,.env,/etc/shadow) and exfiltration endpoints (e.g.,webhook.site). Although these are targets for the detector, their presence in the agent's context could be exploited if an injection attack successfully redirects the agent to access these specific paths. - [PROMPT_INJECTION]: The skill includes instructions for managing untrusted data in
references/safe-parsing.md. This highlights an inherent Category 8 risk surface where the agent is encouraged to ingest and process data from external, potentially hostile sources like social media and web pages.
Audit Metadata