Open-AutoGLM Phone Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."

Architecture Overview

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
Input: Screenshot + task description → Output: structured action commands

open-autoglm-phone-agent

Open-AutoGLM Phone Agent

Architecture Overview

Installation