deco-incident-debugging
Deco Incident Debugging
PRIORITY: SPEED - This skill is designed for real-time incident response. Every second counts. Execute the triage workflow immediately and propose solutions as fast as possible.
When to Use This Skill
- Production incident in progress
- Site is down, slow, or showing errors
- Customer escalation requiring immediate response
- On-call engineer needs AI assistance during war room
- Post-incident to document root cause and learnings
Quick Start - 60 Second Triage
1. GET SYMPTOMS → What error/behavior is the user seeing?
2. MATCH LEARNINGS → Search learnings/ for similar patterns
3. PROPOSE SOLUTIONS → If match found, apply known fix
4. DEEP DIVE → If no match, run diagnostic workflow
5. DOCUMENT → Capture new learning if novel issue
Files in This Skill
| File | Purpose |
|---|---|
SKILL.md |
This overview and quick reference |
triage-workflow.md |
Step-by-step fast triage process (interactive) |
headless-mode.md |
Autonomous investigation without human interaction |
learnings-index.md |
Categorized index of all past learnings |
Operating Modes
Interactive Mode (default)
Use when working alongside an engineer during an incident. The agent asks clarifying questions and collaborates on diagnosis.
Headless Mode
Use when triggered automatically by incident management systems (PagerDuty, Opsgenie, etc.). The agent receives minimal context and autonomously:
- Extracts keywords from alert message
- Searches learnings for matching patterns
- Collects live data (logs, metrics)
- Correlates findings to known issues
- Outputs structured diagnosis with proposed fix
See headless-mode.md for full autonomous workflow, input/output formats, and integration examples.
Learnings Database
We have documented learnings from past incidents in learnings/. These contain:
- Problem: What went wrong
- Symptoms: Observable indicators
- Root Cause: Why it happened (with code examples)
- Solution: How to fix it (with code examples)
- How to Debug: Commands and techniques to diagnose
- Impact: What happens if unfixed
Current Learnings Categories
| Category | Learnings | Key Issues |
|---|---|---|
cache-strategy |
2 | Missing cache, cookie blocking edge cache |
loader-optimization |
2 | N+1 overfetching, lazy section issues |
external-css |
1 | Lazy sections missing styles |
block-config |
1 | Dangling block references |
rich-text |
1 | Hardcoded domain URLs |
ui-bug |
1 | Invisible clickable areas |
responsive |
1 | Breakpoint inconsistencies |
retry-logic |
1 | Off-by-one retry attempts |
safari-bug |
1 | Image flash on navigation |
vtex-routing |
1 | myvtex vs vtexcommercestable domain |
migration |
1 | Deno 1 to Deno 2 migration |
Incident Response Workflow
Phase 1: Rapid Assessment (< 2 minutes)
Ask the engineer these questions:
-
What is the error message or behavior?
- Copy exact error text if available
- Describe what user sees
-
When did it start?
- Sudden vs gradual
- After deployment?
- Traffic spike?
-
What is the scope?
- All users or specific segment?
- All pages or specific routes?
- One site or multiple?
-
What changed recently?
- Code deployments
- Config changes
- Third-party updates
Phase 2: Pattern Matching (< 3 minutes)
Search learnings for known patterns:
# Search by symptom keywords
grep -ri "SYMPTOM_KEYWORD" learnings/
# Examples:
grep -ri "429" learnings/ # Rate limiting
grep -ri "cache" learnings/ # Cache issues
grep -ri "slow" learnings/ # Performance
grep -ri "not found" learnings/ # Missing resources
grep -ri "cookie" learnings/ # Cookie/session issues
grep -ri "vtex" learnings/ # VTEX integration
grep -ri "lazy" learnings/ # Lazy loading issues
Phase 3: Known Issue? Apply Fix Immediately
If symptom matches a learning:
- Read the full learning file
- Verify root cause matches the current symptoms
- Apply the documented solution
- Verify the fix works
Phase 4: Unknown Issue? Deep Diagnostic
If no learning matches, run full diagnostics:
# Check error logs (HyperDX)
SEARCH_LOGS({ query: "level:error site:SITENAME", limit: 50 })
# Check CDN metrics (if slow)
MONITOR_SUMMARY({ sitename: "SITE", hostname: "HOSTNAME" })
MONITOR_TOP_PATHS({ ... })
MONITOR_STATUS_CODES({ ... })
# Check for TypeScript errors
deno check --unstable-tsgo --allow-import main.ts
# Check for recent changes
git log --oneline -20
git diff HEAD~5
# Check block validation
deno run -A https://deco.cx/validate -report validation-report.json
Phase 5: Document New Learning
If this is a novel issue, create a new learning:
# [Title]
## Category
[category-name]
## Problem
[What went wrong]
## Symptoms
- [Observable indicator 1]
- [Observable indicator 2]
## Root Cause
[Why it happened with code examples]
## Solution
[How to fix with code examples]
## How to Debug
[Commands and techniques]
## Files Affected
[List of file patterns]
## Pattern Name
[Short memorable name]
## Checklist Item
[One-line check for future audits]
## Impact
[What happens if unfixed]
Common Incident Patterns
Rate Limiting (429 Errors)
Symptoms: "Too Many Requests" errors, spiky error rates
Quick Check:
grep -ri "rate limit\|429\|overfetch" learnings/
Likely Learnings:
loader-overfetching-n-plus-problem.md- Fetching too much datacache-strategy-standardization-loaders.md- Missing cache causing repeated calls
Immediate Actions:
- Check if loaders have
export const cache - Check for N+1 query patterns
- Add stale-while-revalidate caching
Slow Page Load
Symptoms: High TTFB, slow LCP, user complaints about speed
Quick Check:
grep -ri "slow\|cache\|lazy\|performance" learnings/
Likely Learnings:
cache-strategy-standardization-loaders.md- Missing cachelazy-sections-external-css-loading.md- CSS not loading for lazy sectionsvtex-cookies-prevent-edge-caching.md- Edge cache blocked
Immediate Actions:
- Check cache hit rates with CDN metrics
- Verify lazy sections have proper cache headers
- Check for sync loaders blocking render
Missing Content / Blank Sections
Symptoms: Sections not rendering, blank areas on page
Quick Check:
grep -ri "dangling\|missing\|not found\|blank" learnings/
Likely Learnings:
dangling-block-references.md- Block config points to deleted componentduplicate-sections-masked-by-broken-loaders.md- Loader errors hidden by duplicates
Immediate Actions:
- Run block validation:
deno run -A https://deco.cx/validate - Check browser console for loader errors
- Verify component files exist
VTEX Integration Issues
Symptoms: Products not loading, cart errors, checkout problems
Quick Check:
grep -ri "vtex" learnings/
Likely Learnings:
vtex-domain-routing-myvtex-vs-vtexcommercestable.md- Wrong domainvtex-cookies-prevent-edge-caching.md- Cookie issues
Immediate Actions:
- Check VTEX domain configuration
- Verify API credentials
- Check for VTEX service status
Visual Bugs / UI Issues
Symptoms: Broken layouts, invisible elements, style issues
Quick Check:
grep -ri "invisible\|css\|style\|responsive\|safari" learnings/
Likely Learnings:
invisible-clickable-areas-from-empty-links.md- Empty links covering contentresponsive-breakpoint-consistency.md- Mobile/desktop inconsistenciessafari-image-flash-fix.md- Safari-specific image issueslazy-sections-external-css-loading.md- Missing styles
Immediate Actions:
- Check browser dev tools for overlapping elements
- Inspect CSS loading order
- Test on affected browser/device
Debugging Commands Reference
Error Investigation
# Search error logs
SEARCH_LOGS({ query: "level:error site:SITENAME", limit: 50 })
# Group errors by message
GET_LOG_DETAILS({
query: "level:error site:SITENAME",
groupBy: ["body", "service"]
})
# Timeline of errors
QUERY_CHART_DATA({
series: [{ dataSource: "events", aggFn: "count", where: "level:error" }],
granularity: "1 hour"
})
Performance Investigation
# CDN summary
MONITOR_SUMMARY({ sitename: "SITE", hostname: "HOST", startDate: "YYYY-MM-DD", endDate: "YYYY-MM-DD" })
# Top paths by requests
MONITOR_TOP_PATHS({ ... })
# Cache effectiveness
MONITOR_CACHE_STATUS({ ... })
# Error rates
MONITOR_STATUS_CODES({ ... })
Code Investigation
# Type errors
deno check --unstable-tsgo --allow-import main.ts
# Block validation
deno run -A https://deco.cx/validate
# Find missing cache
grep -L "export const cache" loaders/**/*.ts
# Recent changes
git log --oneline -20
git diff HEAD~5
# Find files with errors
deno check --unstable-tsgo --allow-import main.ts 2>&1 | grep "error:" | sed 's/:.*//g' | sort | uniq -c | sort -rn
Escalation Criteria
Escalate Immediately If:
- Site completely down (no pages loading)
- Checkout broken (revenue impact)
- Data breach suspected
- Issue affecting multiple customers
- Fix requires platform changes (not site code)
Can Handle with This Skill:
- Single site performance issues
- Configuration problems
- Code bugs in site repository
- Cache/loader issues
- Visual/UI bugs
Post-Incident Checklist
After resolving the incident:
- Document root cause in learnings/ if novel
- Create PR with fix
- Update affected checklists if pattern is common
- Share learning with team
- Consider if monitoring should be added
Related Skills
deco-full-analysis- For comprehensive site audits (non-urgent)deco-performance-audit- For deep performance analysisdeco-typescript-fixes- For systematic type error fixes