desktop-computer-automation
Pass
Audited by Gen Agent Trust Hub on Apr 3, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill makes extensive use of
npx @midscene/computer@1to perform system-level operations including mouse movements, clicks, keyboard typing, and window management. These operations are executed via theBashtool. - [EXTERNAL_DOWNLOADS]: The
tap --locatecommand allows the agent to fetch reference images from external URLs (e.g.,github.githubassets.com) to assist in visual element targeting. This is a standard feature of the library used for precise UI interaction. - [DATA_EXFILTRATION]: The skill's core functionality relies on
take_screenshot, which captures the entire desktop state. This involves handling sensitive visual information. Additionally, it requires the configuration of environment variables for AI model API keys (MIDSCENE_MODEL_API_KEY). - [PROMPT_INJECTION]: The skill is susceptible to Indirect Prompt Injection because it processes untrusted data from the user's screen (via screenshots) and interprets it to decide future actions. An attacker could display malicious instructions on the screen (e.g., in a browser or document) that the agent might obey while performing automation tasks.
- Ingestion points: Screenshots captured via
npx @midscene/computer@1 take_screenshot(SKILL.md). - Boundary markers: No explicit delimiters or instructions are provided to the agent to ignore potentially malicious text found within screenshots.
- Capability inventory: Full mouse and keyboard control, and access to the
Bashshell (SKILL.md). - Sanitization: No visual or text-based sanitization of screen content is implemented before processing.
Audit Metadata