orchestrator

SKILL.md

orchestrator

Purpose

This skill handles multi-instance delegation in distributed systems, routing tasks to instances A-H, managing fanout and aggregation of responses, spawning/sending sessions, and performing health checks. It's designed for coordinating complex workflows across distributed-comms clusters.

When to Use

Use this skill for scenarios involving multiple instances, such as load balancing requests across A-H, aggregating data from distributed nodes, managing long-running sessions, or ensuring instance health in fault-tolerant systems. Apply it when single-instance processing isn't sufficient, like in scalable web services or parallel computations.

Key Capabilities

  • Routing: Direct tasks to specific instances (e.g., A-H) using targeted delegation.
  • Fanout/Aggregation: Broadcast requests to multiple instances and collect/aggregate responses, supporting operations like parallel queries.
  • Session Management: Spawn new sessions with sessions_spawn and send data via sessions_send, enabling stateful interactions.
  • Health Checks: Periodically verify instance status with endpoints like /api/health/check, returning JSON with status codes (e.g., 200 for healthy).
  • Error Resilience: Automatically retry failed delegations up to 3 times based on configurable thresholds.

Usage Patterns

  • Simple Routing: Route a task to instance A by specifying the target; use for directed workflows.
  • Fanout and Aggregate: Send a request to instances A-C, then aggregate results; ideal for distributed searches.
  • Session-Based Workflows: Spawn a session, send data, and monitor health; use for multi-step processes.
  • Health Monitoring Loop: Integrate into scripts to check instances every 60 seconds and reroute if needed. Always set the environment variable $ORCHESTRATOR_API_KEY for authenticated operations.

Common Commands/API

  • CLI Commands:
    • Route to instances: openclaw orchestrator route --instances A B --payload '{"data": "task"}'
    • Fanout and aggregate: openclaw orchestrator fanout --targets A-H --aggregate true --timeout 10s
    • Spawn and send session: openclaw orchestrator sessions_spawn --id session1; openclaw orchestrator sessions_send --id session1 --data '{"key": "value"}'
    • Health check: openclaw orchestrator health --instances A-H --output json
  • API Endpoints:
    • POST /api/orchestrator/route: Body: {"instances": ["A", "B"], "payload": {"data": "task"}}, Headers: Authorization: Bearer $ORCHESTRATOR_API_KEY
    • POST /api/orchestrator/fanout: Body: {"targets": ["A", "C"], "aggregate": true}, Returns aggregated JSON array.
    • POST /api/orchestrator/sessions/spawn: Body: {"sessionId": "session1"}, Response: session token.
    • GET /api/orchestrator/health: Query: ?instances=A-H, Returns: {"A": "healthy", "B": "down"}.
  • Code Snippets:
    import requests
    headers = {'Authorization': f'Bearer {os.environ["ORCHESTRATOR_API_KEY"]}'}
    response = requests.post('http://api.openclaw/orchestrator/route', json={'instances': ['A'], 'payload': {'data': 'task'}}, headers=headers)
    
    export ORCHESTRATOR_API_KEY=your_key_here
    openclaw orchestrator fanout --targets A-H --aggregate true > output.json
    
  • Config Formats: Use JSON for payloads, e.g., {"instances": ["A"], "timeout": 5}; store in files like config.json and pass via --config config.json.

Integration Notes

Integrate by setting $ORCHESTRATOR_API_KEY in your environment before running commands. For distributed-comms clusters, ensure the skill is registered via the cluster's API (e.g., POST /api/cluster/register with body {"skillId": "orchestrator"}). Use SDK wrappers for languages like Python; install via pip install openclaw-sdk. When embedding, include the hint string in metadata: orchestrate delegate multi-instance session spawn send fanout aggregate instances. Test integrations in a sandbox environment to verify routing and session persistence.

Error Handling

  • Common Errors:
    • Authentication failures (e.g., 401 Unauthorized): Check if $ORCHESTRATOR_API_KEY is set and valid; retry once after verifying.
    • Instance unreachable (e.g., 503 Service Unavailable): Use --retry 3 in CLI or handle in code with exponential backoff.
    • Aggregation timeouts: Set --timeout 10s and catch errors with try/except in scripts.
  • Prescriptive Steps:
    • For routing errors: Parse response JSON for error codes (e.g., "instance_down") and fallback to healthy instances.
    • In code:
      try:
          response = requests.post(url, headers=headers)
          response.raise_for_status()
      except requests.exceptions.HTTPError as e:
          print(f"Error: {e} - Retrying...")
          # Implement retry logic here
      
    • Always log errors with timestamps and instance details for debugging.

Graph Relationships

  • Related to: distributed-comms cluster (parent), sessions skill (dependency for session management), multi-instance skills (peers for delegation).
  • Dependencies: Requires health-checks module for instance monitoring.
  • Connections: Integrates with delegation tags for routing; aggregates with fanout operations in distributed-comms.
Weekly Installs
3
First Seen
7 days ago
Installed on
openclaw3
gemini-cli3
github-copilot3
codex3
kimi-cli3
cursor3