ai-testing-safety
Audited by Socket on Feb 16, 2026
1 alert found:
Malware[Skill Scanner] Detected system prompt override attempt All findings: [CRITICAL] prompt_injection: Detected system prompt override attempt (PI004) [AITech 1.1] [HIGH] skill_discovery_abuse: System prompt extraction attempt (SD002) [AITech 4.3] This skill is BENIGN in intent (a tool to find AI vulnerabilities) but dual-use and SUSPICIOUS from a supply-chain/use perspective because it deliberately constructs powerful adversarial prompts, recommends using stronger attacker models, and persists optimized attackers. The primary security risks are operational: (1) running attacks against production systems that contain real PII or secrets, (2) sending sensitive prompts/responses to third-party model endpoints, and (3) storing attacker artifacts without access controls or redaction. I recommend: run this only against isolated test instances with synthetic data, configure dspy and model backends to be internal/private, encrypt and limit access to saved optimized attackers and reports, and add explicit warnings in the tool/docs about not sending secrets to external services. LLM verification: This module is a legitimate and effective red-teaming tool that intentionally constructs prompts to elicit unsafe behavior from target AIs. The code itself does not contain obfuscation, hard-coded credentials, or network backdoors; however, its core functionality (automated iterative attacks, explicit examples to extract system prompts and PII) poses a real security risk if run against production systems or targets containing real secrets without isolation and authorization. Recommend: (1) only