Observability Stack Setup

Automated deployment of the complete LGTM (Loki, Grafana, Tempo, Mimir/Prometheus) + Alloy observability stack for Claude Code monitoring.

When to Use

Setting up Claude Code observability for the first time
Deploying local development observability infrastructure
Need to monitor Claude Code operations (tool calls, costs, errors, performance)
Want pre-configured dashboards for Claude Code analysis

What This Skill Does

Automatically deploys and configures:

Grafana Alloy: OTEL collector (receives telemetry from Claude Code)
Loki: Log aggregation (stores all Claude Code logs)
Tempo: Distributed tracing (tracks tool calls, API requests)
Prometheus: Metrics storage (token usage, costs, performance)
Grafana: Visualization with pre-built Claude Code dashboards

Quick Start

Prerequisites

# Verify Docker installed
docker --version  # Requires ≥ 20.10

# Verify Docker Compose installed
docker compose version  # Requires ≥ 2.0

Deploy Stack

Invoke this skill and it will:

Create .observability/ directory structure
Generate all configuration files
Start the stack with docker compose up -d
Import Claude Code dashboards
Verify all services healthy
Output access URLs and next steps

Estimated time: 5-10 minutes

What Gets Deployed

Services

Service	Port	Purpose
Grafana	3000	Dashboards and visualization
Grafana Alloy	4317 (gRPC), 4318 (HTTP), 12345 (metrics)	OTLP receiver
Loki	3100	Log storage and querying
Tempo	3200	Trace storage and querying
Prometheus	9090	Metrics storage and querying

Volumes

All data persisted in .observability/volumes/:

alloy-data/ - Alloy configuration and state
loki-data/ - Log storage
tempo-data/ - Trace storage
prometheus-data/ - Metrics storage
grafana-data/ - Dashboards, datasources, settings

Pre-built Dashboards

Claude Code Overview
- Session count, duration, active time
- Token usage and cost trends
- Error rates by tool
- Top operations
Tool Performance Matrix
- Call counts per tool
- Average/P95/P99 latency
- Success/failure rates
- Most common errors
Cost Analysis
- Daily/weekly/monthly costs
- Token usage breakdown
- Budget tracking
- Cost projections
Error Tracking
- Error timeline
- Error types distribution
- Affected tools
- Recent error details
Session Analysis
- Session duration distribution
- Sessions per day/week
- Conversation depth
- Active vs idle time

Workflow

Step 1: Verify Prerequisites

Checks Docker and Docker Compose installed with compatible versions.

Step 2: Create Directory Structure

.observability/
├── docker-compose.yml          # Main stack definition
├── alloy/
│   └── config.yaml            # OTLP receiver + exporters config
├── grafana/
│   ├── datasources/
│   │   ├── loki.yml           # Loki datasource
│   │   ├── prometheus.yml     # Prometheus datasource
│   │   └── tempo.yml          # Tempo datasource
│   └── dashboards/
│       ├── claude-code-overview.json
│       ├── tool-performance.json
│       ├── cost-analysis.json
│       ├── error-tracking.json
│       └── session-analysis.json
└── volumes/                   # Persistent data
    ├── alloy/
    ├── loki/
    ├── tempo/
    ├── prometheus/
    └── grafana/

Step 3: Generate Configurations

Creates all configuration files from templates (see references/ for details).

Step 4: Start Stack

docker compose -f .observability/docker-compose.yml up -d

Step 5: Health Checks

Verifies each service:

Alloy: http://localhost:12345/metrics
Loki: http://localhost:3100/ready
Tempo: http://localhost:3200/ready
Prometheus: http://localhost:9090/-/healthy
Grafana: http://localhost:3000/api/health

Step 6: Import Dashboards

Uses Grafana API to import all pre-built dashboards.

Step 7: Output Success

Displays:

Access URLs for all services
Default credentials (admin/admin)
OTLP endpoint for Claude Code configuration
Next step: Enable Claude Code telemetry

Configuration Details

Grafana Alloy (OTLP Collector)

Receives telemetry from Claude Code via OTLP protocol:

gRPC endpoint: localhost:4317
HTTP endpoint: localhost:4318

Routes telemetry to backends:

Logs → Loki
Traces → Tempo
Metrics → Prometheus

Retention Policies

Default: 365 days (configurable in docker-compose.yml)

Loki: 365 days (-ingester.max-chunk-age=365d)
Tempo: 365 days (-storage.trace.local.path retention)
Prometheus: 365 days (--storage.tsdb.retention.time=365d)

Privacy Settings

Full logging enabled (no redactions):

User prompts: Full content logged
File paths: Complete paths visible
Tool execution: Full command details
API requests: All parameters visible

This configuration assumes observability for personal use with full data access.

Troubleshooting

Port Already in Use

If ports 3000, 3100, 3200, 4317, 4318, 9090, or 12345 are in use:

Option 1: Stop conflicting services

# Find process using port
sudo lsof -i :3000
# Stop the process
sudo kill <PID>

Option 2: Modify ports in docker-compose.yml

Services Not Starting

Check logs:

docker compose -f .observability/docker-compose.yml logs [service_name]

Common issues:

Insufficient disk space (check with df -h)
Insufficient memory (Alloy needs ~512MB, others ~256MB each)
Permission issues on volume directories

Dashboards Not Appearing

Manually import:

# Copy dashboard JSON to container
docker cp .observability/grafana/dashboards/claude-code-overview.json \
  observability-grafana-1:/tmp/

# Import via API
curl -X POST http://localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -u admin:admin \
  -d @.observability/grafana/dashboards/claude-code-overview.json

Next Steps

After stack is running:

Enable Claude Code telemetry: Use claude-code-telemetry-enable skill
Use Claude Code: Run tools, read files, execute commands
View dashboards: Open http://localhost:3000, explore pre-built dashboards
Verify data flowing: Check Grafana → Explore → Loki/Prometheus/Tempo

Stopping the Stack

Graceful shutdown (preserves data):

docker compose -f .observability/docker-compose.yml down

Complete removal (deletes data):

docker compose -f .observability/docker-compose.yml down -v

References

references/docker-compose-full.yml - Complete Docker Compose configuration
references/alloy-config.yaml - Grafana Alloy OTLP receiver configuration
references/grafana-datasources/ - Datasource YAML configurations
references/dashboards/ - Pre-built dashboard JSON files
references/troubleshooting.md - Common issues and solutions

Scripts

scripts/setup-stack.sh - Main setup script (automated deployment)
scripts/verify-health.sh - Health check all services
scripts/import-dashboards.sh - Import Grafana dashboards

Version Information

Component Versions (latest as of 2025-11-22):

Grafana: 11.5.2
Grafana Alloy: 1.5.0
Loki: 3.4.2
Tempo: 2.7.1
Prometheus: 2.55.0

All versions pinned in docker-compose.yml for reproducibility.

observability-stack-setup