open-autoglm-phone-agent
Installation
SKILL.md
Open-AutoGLM Phone Agent
Skill by ara.so — Daily 2026 Skills collection.
Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."
Architecture Overview
User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
- Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
- Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
- Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
- Input: Screenshot + task description → Output: structured action commands