ios-sim

SKILL.md

Skill: ios-sim

On load: Read ../../.claude-plugin/plugin.json from this skill's base directory. Display ios-sim v{version} before proceeding.

Control the iOS Simulator through shell scripts wrapping xcrun simctl and AXe.

IMPORTANT: Never load screenshots or accessibility trees in the main context.
Always dispatch a subagent for visual/inspection tasks.

Used By

Skill What it uses
a11y-audit ui.sh describe-all for accessibility tree capture
layout-check capture.sh view + ui.sh describe-all for screenshot and element positions
diagnose capture.sh view to read error text from the simulator screen
deeplink-test Screenshot capture to verify the screen after firing a deep link

Prerequisites

  • macOS with Xcode installed
  • A booted iOS simulator (xcrun simctl list devices to check)
  • AXe (brew install cameroncooke/axe/axe) — required for all ui.sh commands (tap, tap-label, tap-id, type, swipe, describe-all, describe-point, list, back, scroll). capture.sh view works without AXe.
  • See ${CLAUDE_SKILL_DIR}/references/troubleshooting.md if anything is missing

Scripts

All scripts live at ${CLAUDE_SKILL_DIR}/scripts/. Run them with Bash.

Routing Table

Intent Workflow Why
See what's on screen view Image stays in subagent
Find UI elements/coordinates inspect JSON tree stays in subagent
Multi-step UI interaction interact Entire loop stays in subagent
Simple one-shot command direct No image/tree involved

Direct Commands (safe for main context)

Intent Script Example
Get booted simulator ID device.sh booted device.sh booted
Open Simulator app device.sh open device.sh open
Save screenshot to file capture.sh screenshot <path> capture.sh screenshot /tmp/shot.png
Start video recording capture.sh record capture.sh record
Stop video recording capture.sh stop capture.sh stop
Install app bundle app.sh install <path> app.sh install /path/to/App.app
Launch app by bundle ID app.sh launch <id> app.sh launch com.example.app
Tap element by accessibility label ui.sh tap-label <label> ui.sh tap-label "Login"
Tap element by accessibility ID ui.sh tap-id <id> ui.sh tap-id "submit-button"
List on-screen elements (Controls/Content) ui.sh list ui.sh list
Tap the back/navigation button ui.sh back ui.sh back
Scroll to top or bottom of list ui.sh scroll top|bottom ui.sh scroll top

Workflows

1. View (screenshot analysis)

When: "What's on the simulator screen?", "How does it look?", "Is there an error?"

Main agent never loads the image. Haiku does the analysis.

Dispatch Agent:
  subagent_type: general-purpose
  model: haiku
  description: "ios-sim: analyze screenshot"
  prompt: |
    1. Run: ${CLAUDE_SKILL_DIR}/scripts/capture.sh view
       This outputs a file path to a compressed JPEG.
    2. Read that file path with the Read tool to see the image.
    3. Analyze and return:
       - Overview: What app/screen is visible (1-2 sentences)
       - Key elements: Buttons, text, inputs, navigation items
       - State: Errors, loading, forms filled, current tab
       - Coordinates: Notable interactive elements with approximate point positions
    4. If the user asked something specific, answer that directly.
    Return text only. Be concise.

    USER QUESTION: [insert user's question here]

2. Inspect (accessibility tree)

When: "What elements are on screen?", "Find the login button", "Where should I tap?"

The accessibility tree JSON can be massive. Parse it in a subagent.

Dispatch Agent:
  subagent_type: general-purpose
  model: haiku
  description: "ios-sim: inspect UI elements"
  prompt: |
    1. Run: ${CLAUDE_SKILL_DIR}/scripts/ui.sh describe-all
       This outputs the full accessibility tree as JSON.
    2. Parse the JSON and return a structured summary:
       - Screen dimensions (from root frame)
       - Interactive elements: buttons, text fields, switches, links
         Format each as: "Label" [type] at (x, y) — size WxH
       - Current focus/selection state
       - Navigation structure (tabs, headers, back buttons)
    3. If looking for a specific element, report its exact coordinates.
    Return text only. Be concise.

    LOOKING FOR: [insert what the user needs to find]

3. Interact (multi-step UI automation)

When: "Tap the login button", "Fill in the form", "Navigate to settings"

Combines view + inspect + actions in a subagent loop. The entire interaction stays isolated — main context only gets the final result.

Dispatch Agent:
  subagent_type: general-purpose
  description: "ios-sim: UI interaction"
  prompt: |
    You are automating the iOS Simulator. Scripts are at:
    ${CLAUDE_SKILL_DIR}/scripts/

    Available commands:
    - capture.sh view               → compressed screenshot (read the output path to see it)
    - ui.sh describe-all            → full accessibility tree JSON
    - ui.sh describe-point X Y      → element at coordinates
    - ui.sh tap X Y                 → tap at point coordinates
    - ui.sh tap X Y --duration S    → long press
    - ui.sh type "text"             → type ASCII text (max 500 chars)
    - ui.sh swipe X1 Y1 X2 Y2      → swipe gesture
    - ui.sh tap-label "label"       → tap element by accessibility label (no coordinate lookup needed)
    - ui.sh tap-id "id"             → tap element by accessibility ID (no coordinate lookup needed)
    - ui.sh list                    → compact table of on-screen elements grouped by Controls/Content
    - ui.sh back                    → heuristic back-button finder and tap (scores by label/position)
    - ui.sh scroll top|bottom       → repeated swipes with stabilization detection (max 10 swipes)

    TASK: [insert what the user wants to do]

    WORKFLOW:
    1. First capture.sh view to see current state
    2. Use ui.sh describe-all if you need exact coordinates
    3. Perform the requested actions
    4. capture.sh view again to verify the result
    5. Return a text summary of what you did and the final state

    RULES:
    - Use POINT coordinates from the accessibility tree, not pixel coordinates
    - After each action, verify the result before proceeding
    - If something fails, try describe-all to re-orient
    - Return text summary only — do not include base64 image data

Tips

  • All scripts auto-detect the booted simulator. Pass --udid <UUID> to target a specific device.
  • Screenshots are 3x pixel resolution. The accessibility tree reports point coordinates. Always use point coordinates for tap/swipe.
  • ui.sh type only accepts ASCII printable characters (max 500 chars).
  • Video recording runs in the background. Use capture.sh stop to finish.
  • For the interact workflow, omit model to use the user's current model (better reasoning for complex multi-step tasks).

Context Efficiency

Item Size In Main Context?
Screenshot JPEG ~100-300 KB NEVER — haiku subagent only
Accessibility tree JSON ~10-100 KB NEVER — subagent only
Subagent text summary ~200-800 chars YES
Direct commands (device, app) ~50-200 chars YES
Weekly Installs
1
GitHub Stars
3
First Seen
5 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1