course-builder-incident-forensics
SKILL.md
Course Builder Incident Forensics
Purpose
Use this workflow to move from "something is broken in a course-builder app" to a concrete root cause with evidence: deployment, endpoint, error signature, throw site, and DB/config precondition status.
Inputs You Need
<app>: Vercel project slug (example:ai-hero)<dataset>: Axiom dataset (often same as app slug)<host>: production domain (example:www.aihero.dev)
Preconditions
- Repo path:
/Users/joel/Code/skillrecordings/support - Vercel access to scope
skillrecordings - Axiom token configured for
skillCLI - Local code path exists at:
/Users/joel/Code/badass-courses/course-builder/apps/<app>
Fast Triage Workflow
1) Confirm deployments and scope/account
vercel ls <app> --scope skillrecordings --non-interactive
Capture:
- current production deployment URL(s)
- whether failures belong to current or previous production deploy
2) Pull API failure distribution from Axiom
cd /Users/joel/Code/skillrecordings/support/packages/cli
bun src/index.ts axiom query "['<dataset>'] | where host == '<host>' and isnotnull(status) and status >= 400 | summarize count=count() by status, path, deploymentUrl | sort by count desc" --since 2h --json
Then pull concrete failing samples:
bun src/index.ts axiom query "['<dataset>'] | where host == '<host>' and isnotnull(status) and status >= 400 | sort by _time desc | limit 20 | project _time, status, method, path, deploymentUrl, requestId" --since 2h --json
3) Pull structured runtime logs for the failing deployment
vercel logs https://<deployment>.vercel.app --project <app> --scope skillrecordings --non-interactive --no-follow --since 2h --limit 1200 --json 2>&1 \
| rg '^\{' \
| jq -c 'select((.level != "info") or ((.responseStatusCode // 0) >= 400) or (.requestPath == "/api/inngest") or (.requestPath == "/api/shortlinks")) | {ts:.timestamp, level, method:.requestMethod, path:.requestPath, status:.responseStatusCode, message, traceId}'
4) Map error signature to throw site in code
rg -n "NonRetriableError|/api/inngest|/api/shortlinks|Preference type or channel not found" /Users/joel/Code/badass-courses/course-builder/apps/<app>/src -g'*.ts'
5) Verify DB preconditions directly (no guessing)
cd /Users/joel/Code/badass-courses/course-builder/apps/<app>
set -a; source .env.production.local; set +a
node --input-type=module -e "import { Client } from '@planetscale/database'; const c=new Client({url:process.env.DATABASE_URL}); const q=c.connection(); const tables=(await q.execute('show tables')).rows.map(r=>Object.values(r)[0]); const typeTable=tables.find(t=>String(t).endsWith('CommunicationPreferenceType')); const channelTable=tables.find(t=>String(t).endsWith('CommunicationChannel')); if(!typeTable||!channelTable){console.log(JSON.stringify({error:'missing communication tables', typeTable, channelTable},null,2)); process.exit(1);} const t=await q.execute(`select id,name from ${typeTable}`); const ch=await q.execute(`select id,name from ${channelTable}`); console.log(JSON.stringify({typeTable,channelTable,types:t.rows,channels:ch.rows},null,2));"
Known Signature: Inngest 400 NonRetriableError
If logs show:
POST /api/inngeststatus400NonRetriableError: Preference type or channel not found
Then check:
src/inngest/functions/user-created.tssrc/inngest/functions/email-send-broadcast.ts
These functions usually require communication seed rows:
- preference type named
Newsletter - channel named
Email
If communication tables are empty, that is the root cause.
Reporting Template
Always report with this structure:
- Account and project (
skillrecordings/<app>) - Active production deployment URL and timestamp window
- Axiom failure breakdown by
status + path + deploymentUrl - Representative failing request IDs/timestamps
- Exact throw site path + line reference
- DB precondition check result (present/missing)
- Root cause statement in one sentence
Guardrails
- Do not print secret values from env files.
- Keep all timestamps in ISO UTC when reporting.
- Separate "current production deploy failures" from "older deploy failures."
- If Axiom event names are sparse/null, rely on
status/path/deploymentUrlfields and Vercel JSON logs.
AI Hero Example
<app>=ai-hero<dataset>=ai-hero<host>=www.aihero.dev
Weekly Installs
2
Repository
skillrecordings/supportFirst Seen
Feb 28, 2026
Security Audits
Installed on
gemini-cli2
opencode2
codebuddy2
github-copilot2
codex2
kimi-cli2