gamma-observability
Gamma Observability
Contents
Overview
Implement the three pillars of observability (metrics, logging, tracing) for Gamma integrations with alerting and dashboards.
Prerequisites
- Observability stack (Prometheus, Grafana, or cloud equivalent)
- Log aggregation (ELK, CloudWatch, or similar)
- APM tool (Datadog, New Relic, or OpenTelemetry)
Instructions
Step 1: Instrument Metrics
Add Prometheus counters (request total, presentations created), histograms (request duration), and gauges (rate limit remaining) to the Gamma client via interceptors.
Step 2: Add Structured Logging
Configure Winston with JSON format, timestamps, and service metadata. Log requests, responses, and errors with sanitized parameters (no API keys).
Step 3: Enable Distributed Tracing
Use OpenTelemetry to create spans for Gamma API calls with operation name, success status, and error recording.
Step 4: Build Dashboard
Create Grafana dashboard with request rate, latency p95, error rate, and rate limit remaining panels.
Step 5: Configure Alerts
- High error rate (>5% for 5 min)
- Rate limit critically low (<10 remaining)
- High latency (p95 > 5s for 5 min)
Step 6: Add Health Check
Endpoint that pings Gamma API and reports status (healthy/degraded/unhealthy) with latency and rate limit info.
See detailed implementation for Prometheus metrics, Winston logger, OpenTelemetry tracing, Grafana dashboard JSON, alert rules, and health check endpoint.
Output
- Prometheus metrics on all API calls
- Structured JSON logging with sanitized params
- Distributed tracing with OpenTelemetry spans
- Grafana dashboard with key metrics
- Alert rules for errors, rate limits, and latency
Error Handling
| Error | Cause | Solution |
|---|---|---|
| Metrics not appearing | Registry not exposed | Add /metrics endpoint serving registry |
| Missing traces | Tracer not initialized | Initialize OpenTelemetry SDK at startup |
| Log volume too high | Debug level in prod | Set LOG_LEVEL to 'info' or 'warn' |
| False alerts | Thresholds too sensitive | Tune alert thresholds to traffic patterns |
Examples
Quick Health Check
set -euo pipefail
curl http://localhost:3000/health/gamma | jq # 3000: 3 seconds in ms
# { "status": "healthy", "latency": 150, "rateLimit": { "remaining": 95, "limit": 100 } }
Resources
Next Steps
Proceed to gamma-incident-runbook for incident response.