sop-runner
SKILL.md
SOP Runner Skill
Execute Standard Operating Procedures (SOPs) - predefined workflows for repeatable operational tasks. SOPs ensure consistent execution of complex procedures.
When to Use
✅ USE this skill when:
- Documented procedures exist
- Compliance requirements
- Runbook execution
- On-call incident response
❌ DON'T use this skill when:
- Ad-hoc one-off tasks
- Procedures still being defined
- Highly dynamic situations
SOP Format
SOPs are defined in YAML:
# sops/deploy-to-prod.yaml
name: Deploy to Production
version: "1.0"
description: Deploy application to production environment
preconditions:
- type: branch
value: main
- type: ci_status
value: success
steps:
- id: notify
name: Notify team
type: message
channel: alerts
message: "Starting production deployment"
- id: backup
name: Create database backup
type: command
command: "pg_dump > backup_$(date +%s).sql"
timeout: 300
onSuccess: continue
onFailure: abort
- id: deploy
name: Deploy application
type: command
command: "kubectl apply -f k8s/"
timeout: 600
- id: verify
name: Verify deployment
type: http
url: "https://api.example.com/health"
expectStatus: 200
retries: 3
- id: complete
name: Notify completion
type: message
channel: alerts
message: "Deployment completed successfully"
postconditions:
- type: rollback_on_failure
Usage
Execute SOP
const { executeSOP } = require('/job/.pi/skills/sop-runner/runner.js');
const result = await executeSOP('deploy-to-prod', {
params: {
version: 'v2.1.0',
environment: 'production'
},
notifyChannel: 'alerts'
});
console.log(result.status); // success|failed|aborted
console.log(result.steps); // Step-by-step results
List Available SOPs
const { listSOPs } = require('/job/.pi/skills/sop-runner/runner.js');
const sops = await listSOPs();
console.log(sops);
// ['deploy-to-prod', 'backup-database', 'incident-response']
Get SOP Details
const { getSOP } = require('/job/.pi/skills/sop-runner/runner.js');
const sop = await getSOP('deploy-to-prod');
console.log(sop.name);
console.log(sop.steps.length);
Resume from Checkpoint
const { resumeSOP } = require('/job/.pi/sop-runner/runner.js');
const result = await resumeSOP('backup-database', {
runId: 'run_abc123',
fromStep: 'verify'
});
Step Types
Command - Execute shell command
type: command
command: "npm run build"
timeout: 300
workingDir: "."
HTTP - Make HTTP request
type: http
method: POST
url: "https://api.example.com/deploy"
headers:
Authorization: Bearer $DEPLOY_TOKEN
body:
version: "${params.version}"
expectStatus: 200
Message - Send notification
type: message
channel: alerts
template: "SOP {{name}} step {{stepId}} completed"
Agent - Delegate to AI agent
type: agent
agent: reviewer
task: "Review the deployment logs"
context: "${previous_step.output}"
Approval - Require human approval
type: approval
message: "Proceed with production deployment?"
timeout: 600 # seconds
approvers:
- oncall-lead
- sre-team
Wait - Delay execution
type: wait
duration: 60 # seconds
Failure Handling
steps:
- id: deploy
type: command
command: "kubectl apply -f k8s/"
onFailure:
action: rollback
# or: abort|retry|max:3|continue
- id: rollback
type: command
command: "kubectl rollout undo deployment/app"
onlyIf: previous_step_failed
Execution Context
Each step has access to:
${params.*}- SOP parameters${previous_step.output}- Previous step output${env.VARIABLE}- Environment variables${runId}- Current run ID${startedAt}- Run start time
Example: Incident Response
# sops/incident-response.yaml
name: Incident Response
version: "2.0"
steps:
- id: notify
type: message
channel: incidents
message: "🚨 Incident detected! Starting response procedure"
- id: gather_logs
type: command
command: "kubectl logs -l app=${params.service} --tail=200 > logs.txt"
- id: analyze
type: agent
agent: analyst
task: "Analyze logs for errors"
context: "${previous_step.output}"
- id: check_metrics
type: http
url: "https://prometheus/api/v1/query"
params:
query: "rate(http_requests_total{status='500'}[5m])"
- id: escalation
type: approval
message: "Escalate to on-call engineer?"
autoApprove:
condition: "${check_metrics.value > 100}"
- id: remediation
type: agent
agent: remediator
task: "Suggest remediation steps"
context: "${analyze.output}"
- id: report
type: message
channel: incidents
message: "Incident analysis complete"
Running SOPs
# Execute SOP
node /job/.pi/skills/sop-runner/runner.js --sop deploy-to-prod
# With parameters
node /job/.pi/skills/sop-runner/runner.js \
--sop deploy-to-prod \
--params '{"version":"v2.1.0","env":"prod"}'
# Dry run
node /job/.pi/skills/sop-runner/runner.js \
--sop deploy-to-prod \
--dry-run
# Resume failed run
node /job/.pi/skills/sop-runner/runner.js \
--resume run_abc123
Output
{
runId: "run_abc123",
sopName: "deploy-to-prod",
status: "success",
startedAt: "2026-02-25T13:30:00Z",
completedAt: "2026-02-25T13:35:00Z",
steps: [
{ id: "notify", status: "success", output: "...", duration: 1200 },
{ id: "backup", status: "success", output: "...", duration: 45000 },
{ id: "deploy", status: "success", output: "...", duration: 120000 },
{ id: "verify", status: "success", output: "...", duration: 5000 }
]
}
Weekly Installs
3
Repository
winsorllc/upgra…carnivalFirst Seen
13 days ago
Security Audits
Installed on
opencode3
gemini-cli3
claude-code3
github-copilot3
codex3
kimi-cli3