ai-vision
Installation
SKILL.md
AI Vision
Overview
This skill provides a standalone CLI to call multimodal models for UI querying, assertion, and single-step planning. It does not depend on device type; you supply a screenshot and receive structured output (coordinates, decisions, or next actions). Execution and multi-step loops are handled externally by agents using adb/hdc or other drivers. Prefer storing screenshots in ~/.eval/screenshots/ and add timestamps to avoid overwriting.
Path Convention
Canonical install and execution directory: ~/.agents/skills/ai-vision/. Run commands from this directory:
cd ~/.agents/skills/ai-vision
One-off (safe in scripts/loops from any working directory):
(cd ~/.agents/skills/ai-vision && npx tsx scripts/ai_vision.ts --help)