full-stack-debugger

Installation

SKILL.md

Full Stack Debugger

Overview

The Full Stack Debugger enables systematic debugging of issues across the entire application stack (UI/Frontend, Backend/API, Database/State). It combines browser testing, log analysis, code examination, and automated server restart/verification to iteratively identify and fix issues one at a time until the system is fully operational.

This skill uses a proven workflow: Detection → Analysis → Fix → Restart → Verification → Iteration to systematically resolve issues that developers encounter during development and testing.

When to Use This Skill

Trigger this skill when observing:

Error states in the UI (dashboard, buttons failing, status showing errors)
Repeated failures in backend logs (task execution failures, import errors, database errors)
Unexpected database state (rows showing failed status when they should succeed)
API endpoints returning errors or unexpected responses
Services failing to initialize or process tasks
Cascading failures across multiple components

Debugging Workflow

Phase 1: Detection

Detect errors from multiple sources:

Browser UI Detection:

Navigate to the affected page/feature in the browser
Check for error messages, red warning states, or disabled functionality
Read console error messages using DevTools
Note the specific UI state and what action triggered the error

Backend Log Detection:

Query recent error logs using tail -200 /path/to/logs/errors.log
Search for error patterns related to the issue using grep
Note error timestamps, error messages, and stack traces
Look for repeated errors (indicates systemic issue)

Database State Detection:

Query the database directly using sqlite3
Check status of recent tasks, transactions, or records
Look for failed, incomplete, or error states
Note which records are affected and what their states are

Example: When debugging a scheduler failure:

Navigate to System Health dashboard
Observe scheduler showing "0 done" or "X failed"
Check /logs/errors.log for error messages
Query queue_tasks table to see failed task records

Phase 2: Analysis

Analyze root causes by reading code and logs:

Code Analysis:

Read the error file/module indicated in error stack traces
Check imports - look for missing from X import Y statements
Check class names - verify instantiation matches actual class names
Look for syntax errors - unmatched quotes, unclosed parentheses
Check function signatures - ensure payloads match expected parameters
Read reference documentation (references/common_errors.md) for error patterns

Log Analysis:

Extract error messages from logs
Look for patterns like 'optional' (missing import), unterminated string (syntax error), 'attribute' (wrong class name)
Trace error propagation backward to find the originating issue
Check timestamps - multiple errors at same time indicate batch failure

API/Payload Analysis:

Check what payload the API is sending to task handlers
Read the task handler code to see what fields it expects
Compare actual payload vs expected payload
Look for missing required fields

Example: When debugging "name 'Optional' is not defined":

Find the file mentioned in error (analysis_executor.py)
Read the imports section
Notice Optional is used but not imported
Check line 14: from typing import Dict, List, Any - missing Optional
Fix: Add Optional to the import statement

Phase 3: Fix (One Issue at a Time)

Apply fixes one issue per iteration:

Before Fixing:

Verify this is the first/next issue to fix
Read the relevant code section carefully
Use the fix patterns from references/fix_templates.md

Common Fix Patterns:

Missing imports: Add to import statement (e.g., from typing import Optional)
Wrong class name: Update import and instantiation to match actual class
Missing docstring quotes: Add opening """ to docstring
Wrong payload fields: Add missing required fields to payload dictionary
Syntax errors: Fix unmatched quotes, parentheses, brackets

After Fixing:

Read back the changed code to verify syntax
Check the edit was correct (line numbers, indentation)
Only fix ONE issue, even if multiple exist - don't cascade fixes
Document what was changed in a clear comment

Example Fix:

# BEFORE
from typing import Dict, List, Any

# AFTER
from typing import Dict, List, Any, Optional

Phase 4: Restart (Automated)

Restart the backend server after each fix:

# Kill existing processes
lsof -ti:8000 | xargs kill -9 2>/dev/null

# Clear Python bytecode cache
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find . -type f -name "*.pyc" -delete 2>/dev/null

# Restart backend
sleep 3 && python -m src.main --command web > /tmp/backend_restart.log 2>&1 &
sleep 10  # Wait for startup

# Verify health
curl -m 5 http://localhost:8000/api/health

Phase 5: Verification

Verify the fix worked through multiple checks:

Health Check:

Call /api/health endpoint
Verify "status": "healthy"
If still failing, check logs for new errors

Browser Verification:

Navigate to the affected UI page
Trigger the action that previously failed
Verify the error is gone
Check for new errors in console

Database Verification:

Query the affected records/tasks
Verify status changed from failed/error to success/completed
Check that metrics updated (e.g., scheduler shows "1 done" instead of "0 done")

Log Verification:

Check recent logs for the same error
Verify no new errors appeared
Look for success messages or "completed" status

Example:

Scheduler should show "1 done" instead of "0 done"
Task record should show status="completed" instead of "failed"
No error messages in logs
WebSocket shows healthy status in UI

Phase 6: Iteration

If issues remain, repeat the cycle:

Continue if more issues exist:
- Check logs for remaining errors
- If yes, return to Phase 2 (Analysis)
- Fix the next issue (Phase 3)
- Restart (Phase 4)
- Verify (Phase 5)
Stop when all issues fixed:
- All schedulers show completed execution counts
- UI shows no error states
- Logs show no error patterns
- Tasks/records show success status
- Full verification complete

Common Error Patterns

See references/common_errors.md for patterns to recognize:

Python syntax errors (unterminated strings, missing quotes)
Import errors (name 'X' is not defined, cannot import name 'Y')
Class/attribute errors ('dict' object has no attribute 'symbol')
Type errors (passing wrong data type)
Payload/configuration errors (missing required fields)

Fix Templates

See references/fix_templates.md for ready-to-use fix patterns:

How to add missing imports
How to fix class name mismatches
How to fix docstring syntax
How to add missing payload fields
How to fix type errors

Tools Used

Playwright Browser Tools: Navigate UI, verify changes
Read/Grep Tools: Examine code and logs
Bash: Server restart, cache clearing, health checks
Edit Tool: Apply code fixes
Database Queries: Verify task/record state

MCP Tools Integration

Use robo-trader-dev MCP tools for 95%+ token-efficient debugging:

Task	MCP Tool	Token Savings	Usage
Analyze error logs	`mcp__robo-trader-dev__analyze_logs`	98%	Pattern detection with time windows
System health check	`mcp__robo-trader-dev__check_system_health`	97%	Database, queues, API, disk status
Diagnose DB locks	`mcp__robo-trader-dev__diagnose_database_locks`	95%	Correlate logs with code patterns
Queue monitoring	`mcp__robo-trader-dev__queue_status`	96%	Real-time queue backlog analysis
Coordinator status	`mcp__robo-trader-dev__coordinator_status`	94%	Init status, error details
Error pattern fix	`mcp__robo-trader-dev__suggest_fix`	90%	Known pattern matching with examples
Read code files	`mcp__robo-trader-dev__smart_file_read`	85%	Progressive context (summary/targeted/full)
Find related files	`mcp__robo-trader-dev__find_related_files`	88%	Import/git/similarity analysis

Example debugging workflow:

# 1. Detect errors (MCP instead of tail/grep)
mcp__robo-trader-dev__analyze_logs(patterns=["ERROR", "TIMEOUT"], time_window="1h")

# 2. Check system health (MCP instead of curl loops)
mcp__robo-trader-dev__check_system_health(components=["database", "queues", "api_endpoints"])

# 3. Diagnose specific issue (MCP instead of sqlite3 + code reading)
mcp__robo-trader-dev__diagnose_database_locks(time_window="24h", include_code_references=True)

# 4. Get fix suggestions (MCP instead of manual pattern matching)
mcp__robo-trader-dev__suggest_fix(error_message="name 'Optional' is not defined", context_file="src/services/analyzer.py")

Integration with robo-trader architecture:

Queue operations: Use queue_status to monitor PORTFOLIO_SYNC, DATA_FETCHER, AI_ANALYSIS
Coordinator debugging: Use coordinator_status for BroadcastCoordinator, AIChatCoordinator init issues
Database access: Use query_portfolio or diagnose_database_locks instead of direct sqlite3 connections

Key Principles

One issue at a time - Fix one problem per iteration to prevent cascading failures
Verify immediately - Always restart and verify after each fix
Multi-layer detection - Check UI, logs, and database for clues
Iterative refinement - Continue until all issues resolved
Automated restart - Always use clean restart (kill + cache clear + restart)
Browser verification - Always test in actual UI, not just logs

Related skills

More from ingpoc/skills

Installs

Repository

ingpoc/skills

GitHub Stars

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass