webapp-testing

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The helper script scripts/with_server.py uses subprocess.Popen with shell=True to execute arbitrary strings passed via the --server argument. This is a classic command injection vector that allows an attacker to execute malicious shell commands if they can influence the arguments provided to the script.
  • REMOTE_CODE_EXECUTION (MEDIUM): The skill's primary workflow involves the agent dynamically writing and executing Python scripts. While this is the intended functionality, it lacks any sandboxing or validation, meaning the agent could be manipulated into executing malicious code under the guise of a test script.
  • DATA_EXFILTRATION (LOW): The examples console_logging.py and element_discovery.py are designed to capture browser logs and screenshots, saving them to /mnt/user-data/outputs/. If the agent is directed to a sensitive internal application, this mechanism could be used to extract private data or session tokens.
  • PROMPT_INJECTION (LOW): The instructions in SKILL.md attempt to restrict the agent's behavior by telling it 'DO NOT read the source until you try running the script first.' This is a form of instruction override designed to hide implementation details from the agent's context, which could be used to obscure malicious behavior in more sophisticated attacks.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:49 PM