detecting-ai-model-prompt-injection-attacks

Pass

Audited by Gen Agent Trust Hub on Apr 7, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: Several phrases associated with prompt injection (e.g., 'ignore previous instructions', 'act as', 'disregard') appear in SKILL.md and agent.py. However, these are documented as detection signatures or test cases for the security tool. They do not represent an attempt to hijack the AI agent's behavior.
  • [EXTERNAL_DOWNLOADS]: The skill downloads a pre-trained machine learning model ('protectai/deberta-v3-base-prompt-injection-v2') from Hugging Face Hub. Hugging Face is a well-known and trusted service for hosting AI models, and the model is used for its intended purpose of classifying malicious prompts.
  • [COMMAND_EXECUTION]: The skill includes a Python script (agent.py) designed for command-line use. The script performs local analysis of text and does not execute arbitrary shell commands, access sensitive system paths, or perform unauthorized network operations.
  • [DATA_EXFILTRATION]: No evidence of data exfiltration or unauthorized credential access was found. The skill processes input text strictly for the purpose of identifying security threats and outputs the results to the console or a file as requested by the user.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 7, 2026, 06:46 PM