webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [Command Execution] (HIGH): The skill's core functionality relies on executing arbitrary shell commands provided via the --server argument to with_server.py (e.g., npm run dev, cd backend && python server.py). This allows for unauthorized command execution if the server command strings are manipulated.
  • [Obfuscation/Safety Bypass] (MEDIUM): The skill contains an explicit 'Don't Look' instruction: 'DO NOT read the source until you try running the script first'. This encourages the agent to bypass security auditing of local scripts, treating them as 'black boxes', which is a major security anti-pattern for AI agents.
  • [Indirect Prompt Injection] (HIGH): The skill is designed to process untrusted data from web applications.
  • Ingestion points: External data enters the context via page.content(), DOM inspection, and screenshots as described in the 'Reconnaissance-Then-Action Pattern'.
  • Boundary markers: No boundary markers or delimiters are suggested to separate untrusted web content from agent instructions.
  • Capability inventory: The skill provides capabilities for shell command execution and Python script execution.
  • Sanitization: There is no evidence of sanitization or filtering of the web content before it is processed by the agent to determine its next actions.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 11:15 AM