NYC

webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill provides and encourages the use of a helper script scripts/with_server.py that executes arbitrary shell commands provided as arguments (e.g., via the --server flag).
  • Evidence: The documentation explicitly shows examples like python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py.
  • [REMOTE_CODE_EXECUTION] (HIGH): The skill instructs the agent to write and execute native Python scripts using the Playwright library. This allows for arbitrary code execution within the environment where the agent is running.
  • Evidence: Instructions include 'To test local web applications, write native Python Playwright scripts' and provide code blocks for the agent to generate and run.
  • [DATA_EXFILTRATION] (MEDIUM): Because the skill can execute arbitrary Python/Playwright code and has access to the local network (localhost), it could potentially be used to read local data and exfiltrate it via network requests, although no explicit malicious network patterns were found.
  • [TRUSTED_SOURCE] (INFO): The repository is associated with 'anthropics', which is a trusted organization. Per global rules, this downgrades the download finding to INFO but does not reduce the severity of the command execution and script execution behaviors.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 04:30 AM