ios-sim
Skill: ios-sim
On load: Read ../../.claude-plugin/plugin.json from this skill's base directory. Display ios-sim v{version} before proceeding.
Control the iOS Simulator through shell scripts wrapping xcrun simctl and AXe.
IMPORTANT: Never load screenshots or accessibility trees in the main context.
Always dispatch a subagent for visual/inspection tasks.
Used By
| Skill | What it uses |
|---|---|
a11y-audit |
ui.sh describe-all for accessibility tree capture |
layout-check |
capture.sh view + ui.sh describe-all for screenshot and element positions |
diagnose |
capture.sh view to read error text from the simulator screen |
deeplink-test |
Screenshot capture to verify the screen after firing a deep link |
Prerequisites
- macOS with Xcode installed
- A booted iOS simulator (
xcrun simctl list devicesto check) - AXe (
brew install cameroncooke/axe/axe) — required for allui.shcommands (tap, tap-label, tap-id, type, swipe, describe-all, describe-point, list, back, scroll).capture.sh viewworks without AXe. - See
${CLAUDE_SKILL_DIR}/references/troubleshooting.mdif anything is missing
Scripts
All scripts live at ${CLAUDE_SKILL_DIR}/scripts/. Run them with Bash.
Routing Table
| Intent | Workflow | Why |
|---|---|---|
| See what's on screen | view | Image stays in subagent |
| Find UI elements/coordinates | inspect | JSON tree stays in subagent |
| Multi-step UI interaction | interact | Entire loop stays in subagent |
| Simple one-shot command | direct | No image/tree involved |
Direct Commands (safe for main context)
| Intent | Script | Example |
|---|---|---|
| Get booted simulator ID | device.sh booted |
device.sh booted |
| Open Simulator app | device.sh open |
device.sh open |
| Save screenshot to file | capture.sh screenshot <path> |
capture.sh screenshot /tmp/shot.png |
| Start video recording | capture.sh record |
capture.sh record |
| Stop video recording | capture.sh stop |
capture.sh stop |
| Install app bundle | app.sh install <path> |
app.sh install /path/to/App.app |
| Launch app by bundle ID | app.sh launch <id> |
app.sh launch com.example.app |
| Tap element by accessibility label | ui.sh tap-label <label> |
ui.sh tap-label "Login" |
| Tap element by accessibility ID | ui.sh tap-id <id> |
ui.sh tap-id "submit-button" |
| List on-screen elements (Controls/Content) | ui.sh list |
ui.sh list |
| Tap the back/navigation button | ui.sh back |
ui.sh back |
| Scroll to top or bottom of list | ui.sh scroll top|bottom |
ui.sh scroll top |
Workflows
1. View (screenshot analysis)
When: "What's on the simulator screen?", "How does it look?", "Is there an error?"
Main agent never loads the image. Haiku does the analysis.
Dispatch Agent:
subagent_type: general-purpose
model: haiku
description: "ios-sim: analyze screenshot"
prompt: |
1. Run: ${CLAUDE_SKILL_DIR}/scripts/capture.sh view
This outputs a file path to a compressed JPEG.
2. Read that file path with the Read tool to see the image.
3. Analyze and return:
- Overview: What app/screen is visible (1-2 sentences)
- Key elements: Buttons, text, inputs, navigation items
- State: Errors, loading, forms filled, current tab
- Coordinates: Notable interactive elements with approximate point positions
4. If the user asked something specific, answer that directly.
Return text only. Be concise.
USER QUESTION: [insert user's question here]
2. Inspect (accessibility tree)
When: "What elements are on screen?", "Find the login button", "Where should I tap?"
The accessibility tree JSON can be massive. Parse it in a subagent.
Dispatch Agent:
subagent_type: general-purpose
model: haiku
description: "ios-sim: inspect UI elements"
prompt: |
1. Run: ${CLAUDE_SKILL_DIR}/scripts/ui.sh describe-all
This outputs the full accessibility tree as JSON.
2. Parse the JSON and return a structured summary:
- Screen dimensions (from root frame)
- Interactive elements: buttons, text fields, switches, links
Format each as: "Label" [type] at (x, y) — size WxH
- Current focus/selection state
- Navigation structure (tabs, headers, back buttons)
3. If looking for a specific element, report its exact coordinates.
Return text only. Be concise.
LOOKING FOR: [insert what the user needs to find]
3. Interact (multi-step UI automation)
When: "Tap the login button", "Fill in the form", "Navigate to settings"
Combines view + inspect + actions in a subagent loop. The entire interaction stays isolated — main context only gets the final result.
Dispatch Agent:
subagent_type: general-purpose
description: "ios-sim: UI interaction"
prompt: |
You are automating the iOS Simulator. Scripts are at:
${CLAUDE_SKILL_DIR}/scripts/
Available commands:
- capture.sh view → compressed screenshot (read the output path to see it)
- ui.sh describe-all → full accessibility tree JSON
- ui.sh describe-point X Y → element at coordinates
- ui.sh tap X Y → tap at point coordinates
- ui.sh tap X Y --duration S → long press
- ui.sh type "text" → type ASCII text (max 500 chars)
- ui.sh swipe X1 Y1 X2 Y2 → swipe gesture
- ui.sh tap-label "label" → tap element by accessibility label (no coordinate lookup needed)
- ui.sh tap-id "id" → tap element by accessibility ID (no coordinate lookup needed)
- ui.sh list → compact table of on-screen elements grouped by Controls/Content
- ui.sh back → heuristic back-button finder and tap (scores by label/position)
- ui.sh scroll top|bottom → repeated swipes with stabilization detection (max 10 swipes)
TASK: [insert what the user wants to do]
WORKFLOW:
1. First capture.sh view to see current state
2. Use ui.sh describe-all if you need exact coordinates
3. Perform the requested actions
4. capture.sh view again to verify the result
5. Return a text summary of what you did and the final state
RULES:
- Use POINT coordinates from the accessibility tree, not pixel coordinates
- After each action, verify the result before proceeding
- If something fails, try describe-all to re-orient
- Return text summary only — do not include base64 image data
Tips
- All scripts auto-detect the booted simulator. Pass
--udid <UUID>to target a specific device. - Screenshots are 3x pixel resolution. The accessibility tree reports point coordinates. Always use point coordinates for tap/swipe.
ui.sh typeonly accepts ASCII printable characters (max 500 chars).- Video recording runs in the background. Use
capture.sh stopto finish. - For the interact workflow, omit
modelto use the user's current model (better reasoning for complex multi-step tasks).
Context Efficiency
| Item | Size | In Main Context? |
|---|---|---|
| Screenshot JPEG | ~100-300 KB | NEVER — haiku subagent only |
| Accessibility tree JSON | ~10-100 KB | NEVER — subagent only |
| Subagent text summary | ~200-800 chars | YES |
| Direct commands (device, app) | ~50-200 chars | YES |