pagerduty-oncall
PagerDuty On-Call Incident Investigator
Authenticate, list escalation policies, fetch all incidents and their details, then analyse relevance across Envato on-call teams.
Inputs
Raw arguments: $ARGUMENTS
Infer from the arguments:
- QUERY: what to investigate. Defaults to "incidents today". Interpret the time range to derive --since and --until dates (YYYY-MM-DD) in the agent's local timezone (detect via system clock), not UTC. See the date derivation table in Step 5.
Target Escalation Policies
The list of escalation policies to investigate is resolved in order:
config.json—escalation_policiesarray in config.jsonPD_ESCALATION_POLICIES— comma-separated env var (e.g."On Call, Platform Engineering On-Call")- If both are empty, all escalation policies are included
System Requirements
pdCLI installed (https://github.com/martindstone/pagerduty-cli)- Environment variable
PAGERDUTY_API_TOKENset with a valid PagerDuty REST API token. Important: When checking this variable, verify at least 2 times before concluding it is not set. Environment variables can appear unset due to shell context differences. Never expose the value — use existence checks only (e.g.test -n "$PAGERDUTY_API_TOKEN").
Scripts
All parsing and filtering is handled by scripts in ${CLAUDE_SKILL_DIR}/scripts/:
| Script | Purpose |
|---|---|
extract-json.js |
Extracts JSON from pd CLI output that may contain non-JSON text |
filter-eps.js |
Filters EPs by target names, extracts relevant fields |
filter-incidents.js |
Filters incidents by service IDs from ep-list.json, extracts fields |
parse-log.js |
Extracts relevant fields from incident log entries |
parse-notes.js |
Extracts relevant fields from incident notes |
parse-analytics.js |
Extracts relevant fields from incident analytics |
All scripts read from stdin and write to stdout. Pipe pd CLI output through extract-json.js first, then through the appropriate filter/parse script.
Output Directory
All intermediate JSON and the final report are saved to:
.pagerduty-oncall-tmp/
├── ep-list.json # Filtered escalation policies
├── incidents.json # Incidents filtered by target EPs
├── logs/<INCIDENT_ID>.json # Log entries per incident
├── notes/<INCIDENT_ID>.json # Notes per incident
├── analytics/<INCIDENT_ID>.json # Analytics per incident
└── report.md # Final analysis report
Execution
1. Verify Connection
Check that PAGERDUTY_API_TOKEN is set (verify at least 2 times before concluding it is missing):
test -n "$PAGERDUTY_API_TOKEN" && echo "TOKEN_SET" || echo "TOKEN_MISSING"
If token is set, authenticate:
pd auth add --token "$PAGERDUTY_API_TOKEN"
If authentication fails, use AskUserQuestion to inform the user and link to the PagerDuty CLI User Guide for setup instructions. Do NOT continue until authentication succeeds.
2. Prepare Output Directory
Create the output directory .pagerduty-oncall-tmp/ and subdirectories logs/, notes/, analytics/ before saving any files.
3. Load Escalation Policy Configuration
Read config.json using the Read tool. Extract the escalation_policies array.
If the array is empty, check for the PD_ESCALATION_POLICIES env var:
test -n "$PD_ESCALATION_POLICIES" && echo "$PD_ESCALATION_POLICIES" || echo "EMPTY"
Build the list of target EP names for the next step. If both sources are empty, no names will be passed (all EPs included).
4. List and Filter Escalation Policies
Fetch, extract JSON, filter by target names, and save in one pipeline:
pd ep list --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/filter-eps.js "EP Name 1" "EP Name 2" > .pagerduty-oncall-tmp/ep-list.json
Pass each target EP name as a separate argument to filter-eps.js. If no target names, omit the arguments (all EPs pass through).
5. Fetch and Filter Incidents
Derive SINCE_DATE and UNTIL_DATE (YYYY-MM-DD) from QUERY in the agent's local timezone. Important: --until is exclusive in the pd CLI — it does NOT include that day. Use this table (assuming today is 2026-03-06):
| Input | SINCE_DATE | UNTIL_DATE | Why |
|---|---|---|---|
| "last 24h" | 2026-03-05 | 2026-03-07 | 24h back from now, until must be tomorrow to include today |
| "today" | 2026-03-06 | 2026-03-07 | until=tomorrow to include today |
| "yesterday" | 2026-03-05 | 2026-03-06 | until=today to include yesterday |
| "last 3 days" | 2026-03-03 | 2026-03-07 | 3 days back, until=tomorrow |
| "2026-03-01 to 2026-03-05" | 2026-03-01 | 2026-03-06 | until=day after end date |
Then fetch, extract JSON, filter by service IDs from ep-list.json, and save:
pd incident list --json --statuses=open --statuses=closed --statuses=triggered --statuses=acknowledged --statuses=resolved --since=SINCE_DATE --until=UNTIL_DATE 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/filter-incidents.js .pagerduty-oncall-tmp/ep-list.json > .pagerduty-oncall-tmp/incidents.json
Read .pagerduty-oncall-tmp/incidents.json to check the result. If the array is empty, write a report noting zero incidents and stop.
6. Gather Incident Details
For each incident in .pagerduty-oncall-tmp/incidents.json, fetch details sequentially (to avoid PagerDuty API rate limits). Use the id field (e.g. Q1V3O5Q3JX39LJ), NOT incident_number — the pd CLI only accepts internal IDs.
Log entries:
pd incident log -i <INCIDENT_ID> --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-log.js > .pagerduty-oncall-tmp/logs/<INCIDENT_ID>.json
Notes:
pd incident notes -i <INCIDENT_ID> --output=json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-notes.js > .pagerduty-oncall-tmp/notes/<INCIDENT_ID>.json
Analytics:
pd incident analytics -i <INCIDENT_ID> --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-analytics.js > .pagerduty-oncall-tmp/analytics/<INCIDENT_ID>.json
7. Analyse and Report
Read all saved JSON files from .pagerduty-oncall-tmp/ using the Read tool. Produce a structured analysis and save it using Write to .pagerduty-oncall-tmp/report.md:
- Incident Summary Table — For each incident: ID, title, service, escalation policy, status, urgency, created/resolved timestamps (current agent's local timezone via system clock, not UTC), duration
- Cross-Team Correlation — Identify incidents that overlap in time across different escalation policies. Flag potential cascading failures or shared root causes
- Timeline — Chronological view of all incidents across all teams in current agent's local timezone, highlighting clusters of activity
- Key Findings — Patterns, recurring services, repeated triggers, or escalation policy gaps
- Recommendations — Actionable suggestions based on the analysis
After writing the report, inform the user of the report location: .pagerduty-oncall-tmp/report.md