sop-runner

SKILL.md

SOP Runner Skill

Execute Standard Operating Procedures (SOPs) - predefined workflows for repeatable operational tasks. SOPs ensure consistent execution of complex procedures.

When to Use

USE this skill when:

  • Documented procedures exist
  • Compliance requirements
  • Runbook execution
  • On-call incident response

DON'T use this skill when:

  • Ad-hoc one-off tasks
  • Procedures still being defined
  • Highly dynamic situations

SOP Format

SOPs are defined in YAML:

# sops/deploy-to-prod.yaml
name: Deploy to Production
version: "1.0"
description: Deploy application to production environment

preconditions:
  - type: branch
    value: main
  - type: ci_status
    value: success

steps:
  - id: notify
    name: Notify team
    type: message
    channel: alerts
    message: "Starting production deployment"

  - id: backup
    name: Create database backup
    type: command
    command: "pg_dump > backup_$(date +%s).sql"
    timeout: 300
    onSuccess: continue
    onFailure: abort

  - id: deploy
    name: Deploy application
    type: command
    command: "kubectl apply -f k8s/"
    timeout: 600

  - id: verify
    name: Verify deployment
    type: http
    url: "https://api.example.com/health"
    expectStatus: 200
    retries: 3

  - id: complete
    name: Notify completion
    type: message
    channel: alerts
    message: "Deployment completed successfully"

postconditions:
  - type: rollback_on_failure

Usage

Execute SOP

const { executeSOP } = require('/job/.pi/skills/sop-runner/runner.js');

const result = await executeSOP('deploy-to-prod', {
  params: {
    version: 'v2.1.0',
    environment: 'production'
  },
  notifyChannel: 'alerts'
});

console.log(result.status); // success|failed|aborted
console.log(result.steps); // Step-by-step results

List Available SOPs

const { listSOPs } = require('/job/.pi/skills/sop-runner/runner.js');

const sops = await listSOPs();
console.log(sops);
// ['deploy-to-prod', 'backup-database', 'incident-response']

Get SOP Details

const { getSOP } = require('/job/.pi/skills/sop-runner/runner.js');

const sop = await getSOP('deploy-to-prod');
console.log(sop.name);
console.log(sop.steps.length);

Resume from Checkpoint

const { resumeSOP } = require('/job/.pi/sop-runner/runner.js');

const result = await resumeSOP('backup-database', {
  runId: 'run_abc123',
  fromStep: 'verify'
});

Step Types

Command - Execute shell command

type: command
command: "npm run build"
timeout: 300
workingDir: "."

HTTP - Make HTTP request

type: http
method: POST
url: "https://api.example.com/deploy"
headers:
  Authorization: Bearer $DEPLOY_TOKEN
body:
  version: "${params.version}"
expectStatus: 200

Message - Send notification

type: message
channel: alerts
template: "SOP {{name}} step {{stepId}} completed"

Agent - Delegate to AI agent

type: agent
agent: reviewer
task: "Review the deployment logs"
context: "${previous_step.output}"

Approval - Require human approval

type: approval
message: "Proceed with production deployment?"
timeout: 600 # seconds
approvers:
  - oncall-lead
  - sre-team

Wait - Delay execution

type: wait
duration: 60 # seconds

Failure Handling

steps:
  - id: deploy
    type: command
    command: "kubectl apply -f k8s/"
    onFailure:
      action: rollback
      # or: abort|retry|max:3|continue
    
  - id: rollback
    type: command
    command: "kubectl rollout undo deployment/app"
    onlyIf: previous_step_failed

Execution Context

Each step has access to:

  • ${params.*} - SOP parameters
  • ${previous_step.output} - Previous step output
  • ${env.VARIABLE} - Environment variables
  • ${runId} - Current run ID
  • ${startedAt} - Run start time

Example: Incident Response

# sops/incident-response.yaml
name: Incident Response
version: "2.0"

steps:
  - id: notify
    type: message
    channel: incidents
    message: "🚨 Incident detected! Starting response procedure"

  - id: gather_logs
    type: command
    command: "kubectl logs -l app=${params.service} --tail=200 > logs.txt"

  - id: analyze
    type: agent
    agent: analyst
    task: "Analyze logs for errors"
    context: "${previous_step.output}"

  - id: check_metrics
    type: http
    url: "https://prometheus/api/v1/query"
    params:
      query: "rate(http_requests_total{status='500'}[5m])"

  - id: escalation
    type: approval
    message: "Escalate to on-call engineer?"
    autoApprove:
      condition: "${check_metrics.value > 100}"

  - id: remediation
    type: agent
    agent: remediator
    task: "Suggest remediation steps"
    context: "${analyze.output}"

  - id: report
    type: message
    channel: incidents
    message: "Incident analysis complete"

Running SOPs

# Execute SOP
node /job/.pi/skills/sop-runner/runner.js --sop deploy-to-prod

# With parameters
node /job/.pi/skills/sop-runner/runner.js \
  --sop deploy-to-prod \
  --params '{"version":"v2.1.0","env":"prod"}'

# Dry run
node /job/.pi/skills/sop-runner/runner.js \
  --sop deploy-to-prod \
  --dry-run

# Resume failed run
node /job/.pi/skills/sop-runner/runner.js \
  --resume run_abc123

Output

{
  runId: "run_abc123",
  sopName: "deploy-to-prod",
  status: "success",
  startedAt: "2026-02-25T13:30:00Z",
  completedAt: "2026-02-25T13:35:00Z",
  steps: [
    { id: "notify", status: "success", output: "...", duration: 1200 },
    { id: "backup", status: "success", output: "...", duration: 45000 },
    { id: "deploy", status: "success", output: "...", duration: 120000 },
    { id: "verify", status: "success", output: "...", duration: 5000 }
  ]
}
Weekly Installs
3
First Seen
13 days ago
Installed on
opencode3
gemini-cli3
claude-code3
github-copilot3
codex3
kimi-cli3