pyats-parallel-ops
Parallel Fleet Operations
Execute operations across multiple network devices concurrently using OpenClaw's pCall (parallel call) capability. This skill governs how to scale single-device operations to an entire fleet with failure isolation, result aggregation, and severity-sorted reporting.
When to Use
- Fleet-wide health checks across all devices in the testbed
- Mass configuration audits (collect running configs from every device)
- Network-wide routing table snapshots for baseline or comparison
- Pre/post change validation across all affected devices simultaneously
- Any operation where running sequentially on 10+ devices would be too slow
How pCall Works in OpenClaw
In OpenClaw, parallel execution (pCall) is achieved by listing multiple exec commands in a single response. The agent runtime dispatches them concurrently and collects all results before proceeding.
Key Principles
- Group by role or site -- Organize devices into logical groups (core, distribution, access, WAN, DC) before dispatching
- Run operations concurrently -- List one MCP call per device; they execute in parallel
- Failure isolation -- If one device times out or errors, the other results are still collected
- Result aggregation -- Collect all results, then produce a unified fleet report
- Severity sorting -- Sort findings from CRITICAL to HEALTHY so the worst problems surface first
pCall Pattern
To run the same command on multiple devices in parallel, list the calls together:
# Device 1
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show version"}'
# Device 2
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show version"}'
# Device 3
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show version"}'
All three commands execute concurrently. Results arrive independently and are aggregated by the agent.
Step 0: Discover the Fleet
Always start by listing all devices in the testbed so you know what to operate on:
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_list_devices '{}'
This returns every device with its name, platform, OS, and connection details. Use this to build the device list for parallel operations.
Example 1: Fleet-Wide Health Check
Run a health check on all devices in the testbed concurrently.
Phase 1: Parallel Data Collection
Issue these commands simultaneously -- one set per device:
# R1 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show processes cpu sorted"}'
# R2 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show processes cpu sorted"}'
# SW1 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show processes cpu sorted"}'
Then in a second parallel wave, collect interface and NTP status:
# R1 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ip interface brief"}'
# R2 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show ip interface brief"}'
# SW1 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show ip interface brief"}'
Phase 2: Parallel Log Collection
# R1 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"R1"}'
# R2 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"R2"}'
# SW1 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"SW1"}'
Phase 3: Aggregate and Report
After all parallel results return, analyze each device individually and produce the fleet summary (see Fleet Report Format below).
Example 2: Fleet-Wide Config Audit
Collect the running configuration from every device in parallel for compliance analysis.
# R1 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"R1"}'
# R2 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"R2"}'
# SW1 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"SW1"}'
# SW2 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"SW2"}'
After collection, apply the pyats-security audit checks to each config and produce a fleet-wide security posture report.
Common config audit checks to apply in parallel:
- SSH version 2 only
- No telnet on VTY lines
service password-encryptionenabled- VTY access-class applied
- NTP configured
- Logging host configured
- No default SNMP community strings
Example 3: Fleet-Wide Routing Table Snapshot
Capture the routing table from every device simultaneously for baseline documentation or pre-change verification.
# R1 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ip route"}'
# R2 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show ip route"}'
# R3 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R3","command":"show ip route"}'
# R4 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R4","command":"show ip route"}'
After collection, analyze per device:
- Total route count by protocol (connected, static, OSPF, BGP, EIGRP)
- Default route presence and source
- Expected prefix verification
- ECMP paths for critical prefixes
Produce a fleet routing summary:
Fleet Routing Snapshot - YYYY-MM-DD HH:MM UTC
┌──────────┬────────┬────────┬──────┬──────┬──────────────┬─────────┐
│ Device │ Total │ Conn. │ OSPF │ BGP │ Default Rte │ Status │
├──────────┼────────┼────────┼──────┼──────┼──────────────┼─────────┤
│ R1 │ 47 │ 5 │ 12 │ 28 │ via 10.1.1.2 │ HEALTHY │
│ R2 │ 45 │ 4 │ 12 │ 27 │ via 10.1.1.1 │ HEALTHY │
│ R3 │ 38 │ 3 │ 12 │ 21 │ via 10.2.1.1 │ WARNING │
│ R4 │ 0 │ 0 │ 0 │ 0 │ MISSING │ CRITICAL│
└──────────┴────────┴────────┴──────┴──────┴──────────────┴─────────┘
Example 4: Severity-Sorted Fleet Reporting
After collecting results from all devices, aggregate findings and sort by severity. This is the standard output format for all fleet operations.
Severity Levels
- CRITICAL -- Immediate action required. Device unreachable, process crash, zero routes, total connectivity loss.
- HIGH -- Fix within hours. CPU > 90%, memory > 95%, routing adjacency down, interface flapping.
- MEDIUM -- Fix within days. Missing NTP, elevated CPU (50-75%), log errors, config non-compliance.
- HEALTHY -- No issues. All checks passed.
Fleet Report Format
Fleet Health Report - YYYY-MM-DD HH:MM UTC
Testbed: production-network
Devices scanned: 8 | Duration: 12s (parallel)
=== CRITICAL (Immediate Action) ===
[C-001] R4 - UNREACHABLE
Connection timed out after 30s. Verify device is powered on and management IP is reachable.
Impact: No data collected for R4. Manual investigation required.
[C-002] SW2 - CPU 97% (5min avg)
Top process: OSPF-1 Hello (45%), IP Input (32%)
Impact: Risk of control plane failure. OSPF hellos may be missed.
=== HIGH (Fix Within Hours) ===
[H-001] R2 - GigabitEthernet3 down/down
Last state change: 2 hours ago. 47 resets in last 24h.
Impact: Backup WAN link unavailable. No redundancy for site B.
[H-002] SW1 - OSPF neighbor 3.3.3.3 in INIT state
Expected: FULL. Interface: Vlan100. Duration: 45 minutes.
Impact: Inter-VLAN routing for VLAN 100 may be impaired.
=== MEDIUM (Fix Within Days) ===
[M-001] R1 - NTP not synchronized
No peer with '*' in show ntp associations. Clock offset: unknown.
Impact: Log timestamps may be inaccurate for forensics.
[M-002] R3 - 3 OSPF adjacency flaps in last 24h
Neighbors affected: 2.2.2.2 on Gi1 (flapped 3 times).
Impact: Route convergence events. Brief traffic disruption during SPF.
=== HEALTHY ===
R1: All checks passed (CPU 12%, Mem 45%, 4/4 interfaces up, OSPF stable)
R3: All checks passed (CPU 8%, Mem 38%, 3/3 interfaces up, BGP stable)
SW3: All checks passed (CPU 5%, Mem 22%, 24/24 ports up, STP stable)
=== FLEET SUMMARY ===
┌──────────┬──────────┬──────────────────────────────────────────────┐
│ Device │ Status │ Key Finding │
├──────────┼──────────┼──────────────────────────────────────────────┤
│ R4 │ CRITICAL │ Unreachable - connection timeout │
│ SW2 │ CRITICAL │ CPU 97% - OSPF/IP Input │
│ R2 │ HIGH │ Gi3 down/down - 47 resets │
│ SW1 │ HIGH │ OSPF neighbor INIT - Vlan100 │
│ R1 │ MEDIUM │ NTP not synchronized │
│ R3 │ MEDIUM │ 3 OSPF flaps in 24h │
│ R1 │ HEALTHY │ All checks passed │
│ R3 │ HEALTHY │ All checks passed │
│ SW3 │ HEALTHY │ All checks passed │
└──────────┴──────────┴──────────────────────────────────────────────┘
Overall Fleet Status: CRITICAL (2 critical, 2 high, 2 medium, 3 healthy)
Failure Isolation
When one device fails during parallel execution, it does not block or cancel the other operations:
- Connection timeout -- Mark device as CRITICAL/UNREACHABLE, continue with others
- Command error -- Record the error for that device, continue collecting from others
- Parse failure -- Fall back to raw text output for that device, report as WARNING
Handling Unreachable Devices
# If R4 times out, you still get results from R1, R2, R3
# In the fleet report, R4 appears as:
# [C-001] R4 - UNREACHABLE
# Connection timed out. Device excluded from further checks.
The key principle: always produce a report for every device, even if the report says "unreachable."
Grouping Strategies
By Role
Group devices by their function in the network to prioritize operations:
Core routers: R1, R2 (check first - highest blast radius)
Distribution: SW1, SW2 (check second)
Access: SW3, SW4, SW5 (check third)
WAN: WAN1, WAN2 (check in parallel with core)
By Site
For multi-site networks, group by location:
Site A (HQ): R1, SW1, SW2
Site B (Branch): R2, SW3
Site C (DR): R3, SW4
By Change Scope
When validating a change, group by affected vs unaffected:
Affected devices: R1, R2 (check thoroughly - full health check)
Adjacent devices: SW1, R3 (check routing adjacencies and connectivity)
Unaffected devices: SW3, SW4 (spot check - verify no collateral damage)
Scaling Guidelines
| Fleet Size | Strategy |
|---|---|
| 1-5 devices | Single parallel wave, all commands at once |
| 6-20 devices | Two waves: critical devices first, then remaining |
| 20-50 devices | Group by role/site, run 10-15 devices per wave |
| 50+ devices | Group by site, sample 20% per wave, expand if issues found |
For large fleets, start with a sampling strategy: pick 2-3 devices per role per site, run full health checks, then expand to the full fleet only if anomalies are found.
Integration with Other Skills
- pyats-health-check -- The single-device health check procedure. pCall scales it to the fleet by issuing one health check per device in parallel.
- pyats-security -- Fleet-wide security audit. Collect all running configs in parallel, then apply security checks to each config.
- pyats-topology -- Fleet-wide topology discovery. Run CDP/LLDP neighbor collection on all devices in parallel to build the complete network map.
- pyats-dynamic-test -- Run the same aetest validation script against multiple devices in parallel for fleet-wide compliance testing.
- pyats-config-mgmt -- Pre/post change validation on all affected devices simultaneously.
- drawio-diagram -- After fleet discovery, generate a topology diagram showing device status (color-coded by health severity).
- markmap-viz -- Generate fleet health mind maps organized by severity or site.