macro-agent
Macro Agent
Desktop automation and UI control skill with image recognition.
🚨 CRITICAL: How to Handle User Requests
BEFORE doing ANY action, ALWAYS check if a sequence exists for it:
- FIRST run
seq-listto see available sequences - LOOK for sequences that match the user's intent (e.g.,
whatsapp_send_marcofor "send message to Marco") - IF sequence exists: Use
seq-run <sequence_name>then add your custom actions (write message, press enter) - IF NO sequence exists: Then use individual commands
Common Workflow: Send Message to Contact
When user says "send message to X" or "envía mensaje a X":
1. seq-list # Check available sequences
2. seq-run whatsapp_send_<contact> # Run the messaging sequence
3. write "<message>" # Type the message
4. press enter # Send it
NEVER use hotkey super or manual navigation when a sequence exists!
Available Sequences (check with seq-list)
The user has pre-configured sequences for common tasks. Always check them first!
whatsapp_send_ross- Opens WhatsApp and selects Ross contactwhatsapp_send_marco- Opens WhatsApp and selects Marco contact- Other sequences may exist - always run
seq-listfirst!
🎯 How Element Detection Works
When using click-on or move-to, the agent ALWAYS uses image recognition:
- Searches for element image on screen (template matching)
- If not found → FAILS (no fallback to coordinates)
This ensures elements are found dynamically based on their actual position.
Output includes method field:
image= Found by template matching ✅not_found= Image not visible on screen ❌
If element not found: You need to capture it first with region-capture.
⚠️ Important
NO "navigate" command exists. To navigate:
find <name>- Search for element infoclick-on <name>- Click using image recognition (ALWAYS)
Usage
python ~/.copilot/skills/macro-agent/macro_agent.py <command> [args]
Commands Reference
| Action | Command | Example |
|---|---|---|
| Search element | find <name> |
find brave |
| Search text | search <text> |
search save |
| Click element | click-on <name> |
click-on brave |
| Click coords | click X Y |
click 500 300 |
| Move to element | move-to <name> |
move-to button |
| Move to coords | move X Y |
move 500 300 |
| Write text | write <text> |
write "hello" |
| Press key | press <key> |
press enter |
| Hotkey | hotkey <keys> |
hotkey ctrl c |
| Scroll | scroll N |
scroll -3 |
| Screenshot | screenshot <name> |
screenshot test |
| Region capture | region-capture |
region-capture |
Sequence Commands
| Command | Description |
|---|---|
seq-create <name> |
Create new sequence |
seq-add <name> "<action>" |
Add action to sequence |
seq-show <name> |
View sequence |
seq-run <name> |
Execute sequence |
seq-list |
List all sequences |
seq-delete <name> |
Delete sequence |
Output
JSON with:
success: true/falseaction: Command executedtarget: Element name (if applicable)coordinates: {x, y} positionmessage: Result description
Data Locations
- Elements:
~/.copilot/skills/macro-agent/data/elements.json(elemento definitions) - Captures:
~/.copilot/skills/macro-agent/data/captures/(template images) - Sequences:
~/.copilot/skills/macro-agent/data/sequences/(action sequences)
Examples
Find and Click App
python ~/.copilot/skills/macro-agent/macro_agent.py find chrome
python ~/.copilot/skills/macro-agent/macro_agent.py click-on chrome
Type and Submit
python ~/.copilot/skills/macro-agent/macro_agent.py write "search query"
python ~/.copilot/skills/macro-agent/macro_agent.py press enter
Keyboard Shortcut
python ~/.copilot/skills/macro-agent/macro_agent.py hotkey ctrl shift s
Create and Run Sequence
python ~/.copilot/skills/macro-agent/macro_agent.py seq-create my_macro
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on file_menu"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "wait 0.5"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on save_option"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-run my_macro
Capture New Elements
python ~/.copilot/skills/macro-agent/macro_agent.py region-capture
Keys: f=freeze, c/Space=capture, +/-=resize, q/ESC=quit
📱 Example: Send WhatsApp Message
User says: "Envía mensaje a Marco diciendo hola"
CORRECT approach:
# 1. First check sequences
seq-list
# 2. Found whatsapp_send_marco! Run it
seq-run whatsapp_send_marco
# 3. Type and send
write "hola"
press enter
WRONG approach (NEVER do this):
# ❌ WRONG - Don't manually navigate!
hotkey super
wait 500
# This is stupid, use sequences!
🔄 Decision Flow
User Request
↓
Run seq-list
↓
Sequence exists? ──YES──→ seq-run <name> → Additional actions (write, press)
↓ NO
Use individual commands (click-on, write, press, etc.)