skills/skillrecordings/support/course-builder-incident-forensics

course-builder-incident-forensics

SKILL.md

Course Builder Incident Forensics

Purpose

Use this workflow to move from "something is broken in a course-builder app" to a concrete root cause with evidence: deployment, endpoint, error signature, throw site, and DB/config precondition status.

Inputs You Need

  • <app>: Vercel project slug (example: ai-hero)
  • <dataset>: Axiom dataset (often same as app slug)
  • <host>: production domain (example: www.aihero.dev)

Preconditions

  • Repo path: /Users/joel/Code/skillrecordings/support
  • Vercel access to scope skillrecordings
  • Axiom token configured for skill CLI
  • Local code path exists at: /Users/joel/Code/badass-courses/course-builder/apps/<app>

Fast Triage Workflow

1) Confirm deployments and scope/account

vercel ls <app> --scope skillrecordings --non-interactive

Capture:

  • current production deployment URL(s)
  • whether failures belong to current or previous production deploy

2) Pull API failure distribution from Axiom

cd /Users/joel/Code/skillrecordings/support/packages/cli
bun src/index.ts axiom query "['<dataset>'] | where host == '<host>' and isnotnull(status) and status >= 400 | summarize count=count() by status, path, deploymentUrl | sort by count desc" --since 2h --json

Then pull concrete failing samples:

bun src/index.ts axiom query "['<dataset>'] | where host == '<host>' and isnotnull(status) and status >= 400 | sort by _time desc | limit 20 | project _time, status, method, path, deploymentUrl, requestId" --since 2h --json

3) Pull structured runtime logs for the failing deployment

vercel logs https://<deployment>.vercel.app --project <app> --scope skillrecordings --non-interactive --no-follow --since 2h --limit 1200 --json 2>&1 \
  | rg '^\{' \
  | jq -c 'select((.level != "info") or ((.responseStatusCode // 0) >= 400) or (.requestPath == "/api/inngest") or (.requestPath == "/api/shortlinks")) | {ts:.timestamp, level, method:.requestMethod, path:.requestPath, status:.responseStatusCode, message, traceId}'

4) Map error signature to throw site in code

rg -n "NonRetriableError|/api/inngest|/api/shortlinks|Preference type or channel not found" /Users/joel/Code/badass-courses/course-builder/apps/<app>/src -g'*.ts'

5) Verify DB preconditions directly (no guessing)

cd /Users/joel/Code/badass-courses/course-builder/apps/<app>
set -a; source .env.production.local; set +a
node --input-type=module -e "import { Client } from '@planetscale/database'; const c=new Client({url:process.env.DATABASE_URL}); const q=c.connection(); const tables=(await q.execute('show tables')).rows.map(r=>Object.values(r)[0]); const typeTable=tables.find(t=>String(t).endsWith('CommunicationPreferenceType')); const channelTable=tables.find(t=>String(t).endsWith('CommunicationChannel')); if(!typeTable||!channelTable){console.log(JSON.stringify({error:'missing communication tables', typeTable, channelTable},null,2)); process.exit(1);} const t=await q.execute(`select id,name from ${typeTable}`); const ch=await q.execute(`select id,name from ${channelTable}`); console.log(JSON.stringify({typeTable,channelTable,types:t.rows,channels:ch.rows},null,2));"

Known Signature: Inngest 400 NonRetriableError

If logs show:

  • POST /api/inngest status 400
  • NonRetriableError: Preference type or channel not found

Then check:

  • src/inngest/functions/user-created.ts
  • src/inngest/functions/email-send-broadcast.ts

These functions usually require communication seed rows:

  • preference type named Newsletter
  • channel named Email

If communication tables are empty, that is the root cause.

Reporting Template

Always report with this structure:

  1. Account and project (skillrecordings/<app>)
  2. Active production deployment URL and timestamp window
  3. Axiom failure breakdown by status + path + deploymentUrl
  4. Representative failing request IDs/timestamps
  5. Exact throw site path + line reference
  6. DB precondition check result (present/missing)
  7. Root cause statement in one sentence

Guardrails

  • Do not print secret values from env files.
  • Keep all timestamps in ISO UTC when reporting.
  • Separate "current production deploy failures" from "older deploy failures."
  • If Axiom event names are sparse/null, rely on status/path/deploymentUrl fields and Vercel JSON logs.

AI Hero Example

  • <app> = ai-hero
  • <dataset> = ai-hero
  • <host> = www.aihero.dev
Weekly Installs
2
First Seen
Feb 28, 2026
Installed on
gemini-cli2
opencode2
codebuddy2
github-copilot2
codex2
kimi-cli2