log-analysis

SKILL.md

Log Analysis

Overview

Logs are critical for debugging and monitoring. Effective log analysis quickly identifies issues and enables root cause analysis.

When to Use

  • Troubleshooting errors
  • Performance investigation
  • Security incident analysis
  • Auditing user actions
  • Monitoring application health

Instructions

1. Structured Logging

// Good: Structured logs (machine-readable)
logger.info({
  level: 'INFO',
  timestamp: '2024-01-15T10:30:00Z',
  service: 'auth-service',
  user_id: '12345',
  action: 'user_login',
  status: 'success',
  duration_ms: 150,
  ip_address: '192.168.1.1'
});

// Bad: Unstructured logs (hard to parse)
console.log('User 12345 logged in successfully in 150ms from 192.168.1.1');

// JSON Format (Elasticsearch friendly)
{
  "@timestamp": "2024-01-15T10:30:00Z",
  "level": "ERROR",
  "service": "api-gateway",
  "trace_id": "abc123",
  "message": "Database connection failed",
  "error": {
    "type": "ConnectionError",
    "code": "ECONNREFUSED"
  },
  "context": {
    "database": "users",
    "operation": "SELECT"
  }
}

2. Log Levels & Patterns

Log Levels:

DEBUG: Detailed diagnostic info
  - Variable values
  - Function entry/exit
  - Intermediate calculations
  - Use: Development only

INFO: General informational messages
  - Startup/shutdown
  - User actions
  - Configuration changes
  - Use: Production (normal operations)

WARN: Warning messages (potential issues)
  - Deprecated API usage
  - Performance degradation
  - Resource limits approaching
  - Use: Production (investigate soon)

ERROR: Error conditions
  - Failed operations
  - Exceptions
  - Failed requests
  - Use: Production (action required)

FATAL/CRITICAL: System unusable
  - Critical failures
  - Out of memory
  - Data corruption
  - Use: Production (immediate action)

---

Log Patterns:

Request Logging:
  - Request ID (trace_id)
  - Method + Path
  - Status code
  - Duration
  - Request size / response size

Error Logging:
  - Error type/code
  - Error message
  - Stack trace
  - Context (user_id, session_id)
  - Timestamp

Business Events:
  - Event type
  - User involved
  - Impact/importance
  - Timestamp
  - Relevant context

3. Log Analysis Tools

Log Aggregation:

ELK Stack (Elasticsearch, Logstash, Kibana):
  - Logstash: Parse and process logs
  - Elasticsearch: Search and analyze
  - Kibana: Visualization and dashboards
  - Use: Large scale, complex queries

Splunk:
  - Comprehensive log management
  - Real-time search and analysis
  - Dashboards and alerts
  - Use: Enterprise (expensive)

CloudWatch (AWS):
  - Integrated with AWS services
  - Log Insights for querying
  - Dashboards
  - Use: AWS-based systems

Datadog:
  - Application performance monitoring
  - Log management
  - Real-time alerts
  - Use: SaaS monitoring

---

Log Analysis Techniques:

Grep/Awk:
  grep "ERROR" app.log
  awk '{print $1, $4}' app.log

Filtering:
  Filter by timestamp
  Filter by service
  Filter by error type
  Filter by user

Searching:
  Search for error patterns
  Search for user actions
  Search trace IDs
  Search IP addresses

Aggregation:
  Count occurrences
  Group by error type
  Calculate duration percentiles
  Rate of errors over time

4. Common Log Analysis Queries

Find errors in past hour:
  timestamp: last_1h AND level: ERROR

Track user activity:
  user_id: 12345 AND action: *

Find slow requests:
  duration_ms: >1000 AND level: INFO

Analyze error rate by service:
  level: ERROR | stats count by service

Find failed database operations:
  error.type: "DatabaseError" | stats count

Trace request flow:
  trace_id: "abc123" | sort by timestamp

---

Checklist:

[ ] Structured logging implemented
[ ] All errors logged with context
[ ] Request IDs/trace IDs used
[ ] Sensitive data not logged (passwords, tokens)
[ ] Log levels used appropriately
[ ] Log retention policy set
[ ] Log sampling for high-volume events
[ ] Alerts configured for errors
[ ] Dashboards created
[ ] Regular log review scheduled
[ ] Log analysis tools accessible
[ ] Team trained on querying logs

Key Points

  • Use structured JSON logging
  • Include trace IDs for request tracking
  • Log appropriate levels (DEBUG/INFO/ERROR)
  • Never log sensitive data (passwords, tokens)
  • Aggregate logs centrally
  • Create dashboards for key metrics
  • Alert on error rates and critical issues
  • Retain logs appropriately
  • Search logs by trace ID for troubleshooting
  • Review logs regularly for patterns
Weekly Installs
1
Installed on
windsurf1
opencode1
codex1
claude-code1
antigravity1
gemini-cli1