pagerduty-oncall
PagerDuty On-Call Incident Investigator
Authenticate, list escalation policies, fetch all incidents and their details, then analyse relevance across on-call teams.
Inputs
Raw arguments: $ARGUMENTS
Infer from the arguments:
- QUERY: what to investigate. Defaults to "incidents today". Interpret the time range to derive --since and --until dates (YYYY-MM-DD) in UTC (not local timezone). See the date derivation table in Step 5.
- MODE: Inferred from QUERY.
- If QUERY starts with "incident " or contains a PagerDuty URL (matching
pagerduty\.com/incidents/([A-Z0-9]+)) → SINGLE-INCIDENT mode - Otherwise → LIST mode (existing behaviour)
- If QUERY starts with "incident " or contains a PagerDuty URL (matching
- SINCE: (optional) UTC ISO8601 start of analysis window. Takes precedence over QUERY-derived dates in LIST mode.
- UNTIL: (optional) UTC ISO8601 end of analysis window. Takes precedence over QUERY-derived dates in LIST mode.
- TZ_DISPLAY: (optional) Timezone abbreviation for report display (e.g. the output of
date +"%Z"on the calling machine). When provided, use this for all report timestamps instead of detecting via system clock.
Target Escalation Policies
The list of escalation policies to investigate is resolved in order:
config.json—escalation_policiesarray in config.jsonPD_ESCALATION_POLICIES— comma-separated env var (e.g."On Call, Platform Engineering On-Call")- If both are empty, all escalation policies are included
System Requirements
pdCLI installed (https://github.com/martindstone/pagerduty-cli)- Environment variable
PAGERDUTY_API_TOKENset with a valid PagerDuty REST API token. Important: When checking this variable, verify at least 2 times before concluding it is not set. Environment variables can appear unset due to shell context differences. Never expose the value — use existence checks only (e.g.test -n "$PAGERDUTY_API_TOKEN").
Scripts
All parsing and filtering is handled by scripts in ${CLAUDE_SKILL_DIR}/scripts/:
| Script | Purpose |
|---|---|
extract-json.js |
Extracts JSON from pd CLI output that may contain non-JSON text |
filter-eps.js |
Filters EPs by target names, extracts relevant fields |
filter-incidents.js |
Filters incidents by service IDs from ep-list.json, extracts fields |
parse-log.js |
Extracts relevant fields from incident log entries |
parse-notes.js |
Extracts relevant fields from incident notes |
parse-analytics.js |
Extracts relevant fields from incident analytics |
All scripts read from stdin and write to stdout. Pipe pd CLI output through extract-json.js first, then through the appropriate filter/parse script.
Output Directory
All intermediate JSON and the final report are saved to:
.pagerduty-oncall-tmp/
├── ep-list.json # Filtered escalation policies
├── incidents.json # Incidents filtered by target EPs
├── logs/<INCIDENT_ID>.json # Log entries per incident
├── notes/<INCIDENT_ID>.json # Notes per incident
├── analytics/<INCIDENT_ID>.json # Analytics per incident
└── report.md # Final analysis report
Execution
1. Verify Connection
Check that PAGERDUTY_API_TOKEN is set (verify at least 2 times before concluding it is missing):
test -n "$PAGERDUTY_API_TOKEN" && echo "TOKEN_SET" || echo "TOKEN_MISSING"
If token is set, authenticate:
pd auth add --token "$PAGERDUTY_API_TOKEN"
If authentication fails, use AskUserQuestion to inform the user and link to the PagerDuty CLI User Guide for setup instructions. Do NOT continue until authentication succeeds.
2. Detect Local Timezone
If TZ_DISPLAY was not provided, run the following to get the local timezone from the current machine:
date +"%Z"
Record the output as the display timezone for all report timestamps.
3. Prepare Output Directory
Create the output directory .pagerduty-oncall-tmp/ and subdirectories logs/, notes/, analytics/ before saving any files.
4. Load Escalation Policy Configuration
Read config.json using the Read tool. Extract the escalation_policies array.
If the array is empty, check for the PD_ESCALATION_POLICIES env var:
test -n "$PD_ESCALATION_POLICIES" && echo "$PD_ESCALATION_POLICIES" || echo "EMPTY"
Build the list of target EP names for the next step. If both sources are empty, no names will be passed (all EPs included).
5. List and Filter Escalation Policies
Fetch, extract JSON, filter by target names, and save in one pipeline:
pd ep list --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/filter-eps.js "EP Name 1" "EP Name 2" > .pagerduty-oncall-tmp/ep-list.json
Pass each target EP name as a separate argument to filter-eps.js. If no target names, omit the arguments (all EPs pass through).
6-ALT. Single-Incident Mode (when MODE = SINGLE-INCIDENT)
Extract the incident ID from QUERY:
- From URL: match
pagerduty\.com/incidents/([A-Z0-9]+)→ capture group is the ID - From shorthand: match
incident ([A-Z0-9]+)→ capture group is the ID
Fetch the specific incident:
pd rest get --endpoint "/incidents/<INCIDENT_ID>" 2>&1 \
| node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js \
> .pagerduty-oncall-tmp/incidents.json
Extract from the incident JSON:
id,incident_number,title,status,urgencycreated_at,resolved_at(null if still open)service.summary→ SERVICEescalation_policy.summary→ EP
Also fetch related incidents on the same service within ±2h of created_at:
pd incident list --json --statuses=open --statuses=closed --statuses=triggered --statuses=acknowledged --statuses=resolved \
--since=<created_at minus 2h, YYYY-MM-DD UTC> --until=<created_at plus 2h, YYYY-MM-DD UTC> 2>&1 \
| node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js \
| node ${CLAUDE_SKILL_DIR}/scripts/filter-incidents.js .pagerduty-oncall-tmp/ep-list.json \
> .pagerduty-oncall-tmp/related-incidents.json
Then continue to Step 6 (gather details) and Step 7 (report) as normal.
The report MUST include a structured metadata section at the top (UTC ISO8601 for machine parsing):
## Incident Metadata
- created_at: <ISO8601 UTC>
- resolved_at: <ISO8601 UTC or "ongoing">
- service: <name>
- title: <title>
Followed by the human-readable timeline in local timezone (see Step 7).
6. Fetch and Filter Incidents (LIST mode)
Derive SINCE_DATE and UNTIL_DATE (YYYY-MM-DD, in UTC) from QUERY. Important: --until is exclusive in the pd CLI — it does NOT include that day.
If SINCE and UNTIL are provided as UTC ISO8601 timestamps, pass them directly to pd incident list — the pd CLI accepts ISO8601 for --since / --until:
pd incident list --json ... --since=<SINCE> --until=<UNTIL> ...
Otherwise, derive SINCE and UNTIL as UTC ISO8601 timestamps from QUERY using the agent's local timezone. "Today" means the local calendar day — an incident at 11am local is in "today" even if it falls on a different UTC date. Compute local day boundaries and convert to UTC ISO8601 for the pd API.
For relative durations, use pd CLI natural language directly:
pd incident list --json ... --since "24 hours ago" --until "now" ...
For named days and calendar ranges, compute local midnight as UTC ISO8601:
# Start of local today as UTC ISO8601 (Linux / macOS)
date -d "$(date +%Y-%m-%d) 00:00:00" -u +%Y-%m-%dT%H:%M:%SZ # Linux
date -jf "%Y-%m-%d %H:%M:%S" "$(date +%Y-%m-%d) 00:00:00" -u +%Y-%m-%dT%H:%M:%SZ # macOS
# Start of local yesterday
date -d "$(date -d yesterday +%Y-%m-%d) 00:00:00" -u +%Y-%m-%dT%H:%M:%SZ # Linux
date -jf "%Y-%m-%d %H:%M:%S" "$(date -v-1d +%Y-%m-%d) 00:00:00" -u +%Y-%m-%dT%H:%M:%SZ # macOS
Use these ISO8601 values directly as --since / --until on pd incident list. For "today": SINCE = start of local today as UTC ISO8601, UNTIL = omit (defaults to now). For "yesterday": SINCE = start of local yesterday, UNTIL = start of local today. For explicit date ranges: SINCE = start of first local day, UNTIL = start of day after last local day.
Then fetch, extract JSON, filter by service IDs from ep-list.json, and save. Run this as two separate commands — do NOT chain Step 4 and Step 5 into a single pipeline, as filter-incidents.js reads ep-list.json from disk and the file must be fully written before Step 5 begins:
# Step 4 must be complete before running this
pd incident list --json --statuses=open --statuses=closed --statuses=triggered --statuses=acknowledged --statuses=resolved --since=SINCE_DATE --until=UNTIL_DATE 2>&1 > /tmp/pd-incidents-raw.json
node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js < /tmp/pd-incidents-raw.json | node ${CLAUDE_SKILL_DIR}/scripts/filter-incidents.js .pagerduty-oncall-tmp/ep-list.json > .pagerduty-oncall-tmp/incidents.json
Read .pagerduty-oncall-tmp/incidents.json to check the result. If the array is empty, write a report noting zero incidents and stop.
7. Gather Incident Details
For each incident in .pagerduty-oncall-tmp/incidents.json, fetch details sequentially (to avoid PagerDuty API rate limits). Use the id field (e.g. Q1V3O5Q3JX39LJ), NOT incident_number — the pd CLI only accepts internal IDs.
Log entries:
pd incident log -i <INCIDENT_ID> --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-log.js > .pagerduty-oncall-tmp/logs/<INCIDENT_ID>.json
Notes:
pd incident notes -i <INCIDENT_ID> --output=json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-notes.js > .pagerduty-oncall-tmp/notes/<INCIDENT_ID>.json
Analytics:
pd incident analytics -i <INCIDENT_ID> --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-analytics.js > .pagerduty-oncall-tmp/analytics/<INCIDENT_ID>.json
8. Analyse and Report
Read all saved JSON files from .pagerduty-oncall-tmp/ using the Read tool. Produce a structured analysis and save it using Write to .pagerduty-oncall-tmp/report.md:
When in SINGLE-INCIDENT mode, begin the report with the structured metadata block (UTC ISO8601, for machine parsing by the PIR orchestrator):
## Incident Metadata
- created_at: 2026-03-20T02:30:00Z
- resolved_at: 2026-03-20T04:15:00Z
- service: storefront
- title: High error rate on checkout
Then include:
- Incident Summary Table — For each incident: ID, title, service, escalation policy, status, urgency, created/resolved timestamps in local timezone (use TZ_DISPLAY if provided, otherwise detect via system clock; format:
2026-03-20 15:30 <TZ>), duration - Cross-Team Correlation — Identify incidents that overlap in time across different escalation policies. Flag potential cascading failures or shared root causes
- Timeline — Chronological view of all incidents across all teams in local timezone, highlighting clusters of activity
- Key Findings — Patterns, recurring services, repeated triggers, or escalation policy gaps
- Recommendations — Actionable suggestions based on the analysis
All timestamps displayed to users must use local timezone with TZ abbreviation. Do NOT use UTC in human-readable sections. Use TZ_DISPLAY if provided, otherwise detect via system clock (date +"%Z").
After writing the report, explicitly tell the user its location as the final line of your response:
Report saved to
.pagerduty-oncall-tmp/report.md