desktop-computer-automation

Pass

Audited by Gen Agent Trust Hub on Apr 3, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill makes extensive use of npx @midscene/computer@1 to perform system-level operations including mouse movements, clicks, keyboard typing, and window management. These operations are executed via the Bash tool.
  • [EXTERNAL_DOWNLOADS]: The tap --locate command allows the agent to fetch reference images from external URLs (e.g., github.githubassets.com) to assist in visual element targeting. This is a standard feature of the library used for precise UI interaction.
  • [DATA_EXFILTRATION]: The skill's core functionality relies on take_screenshot, which captures the entire desktop state. This involves handling sensitive visual information. Additionally, it requires the configuration of environment variables for AI model API keys (MIDSCENE_MODEL_API_KEY).
  • [PROMPT_INJECTION]: The skill is susceptible to Indirect Prompt Injection because it processes untrusted data from the user's screen (via screenshots) and interprets it to decide future actions. An attacker could display malicious instructions on the screen (e.g., in a browser or document) that the agent might obey while performing automation tasks.
  • Ingestion points: Screenshots captured via npx @midscene/computer@1 take_screenshot (SKILL.md).
  • Boundary markers: No explicit delimiters or instructions are provided to the agent to ignore potentially malicious text found within screenshots.
  • Capability inventory: Full mouse and keyboard control, and access to the Bash shell (SKILL.md).
  • Sanitization: No visual or text-based sanitization of screen content is implemented before processing.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 3, 2026, 04:44 PM