grafana-dashboards
Grafana Dashboards
Build powerful monitoring and observability dashboards.
Instructions
- Start with key metrics - CPU, memory, latency, error rates
- Use consistent time ranges - All panels should sync
- Add context with variables - Filter by environment, service, host
- Set up alerts - Proactive monitoring, not reactive
- Use templates - Consistent dashboard styling
Dashboard JSON Structure
{
"dashboard": {
"id": null,
"uid": "my-dashboard",
"title": "Service Overview",
"tags": ["production", "service-name"],
"timezone": "browser",
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
},
"templating": { "list": [] },
"panels": []
}
}
Panel Types
Time Series
{
"type": "timeseries",
"title": "Request Rate",
"fieldConfig": {
"defaults": {
"unit": "reqps",
"custom": {
"lineWidth": 2,
"fillOpacity": 10
}
}
},
"targets": [
{
"expr": "rate(http_requests_total{job=\"$job\"}[5m])",
"legendFormat": "{{method}} {{status}}"
}
]
}
Stat Panel
{
"type": "stat",
"title": "Total Requests",
"options": {
"colorMode": "value",
"graphMode": "area",
"reduceOptions": {
"calcs": ["lastNotNull"]
}
}
}
Gauge
{
"type": "gauge",
"title": "CPU Usage",
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 70 },
{ "color": "red", "value": 90 }
]
}
}
}
}
Prometheus Queries (PromQL)
Basic Queries
# Instant rate (requests per second)
rate(http_requests_total[5m])
# Sum by label
sum by (status_code) (rate(http_requests_total[5m]))
# Average latency (95th percentile)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100
# CPU usage percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100
Aggregation & Filtering
# Filter by label
http_requests_total{job="api", environment="production"}
# Regex match
http_requests_total{path=~"/api/v[0-9]+/.*"}
# Aggregations
sum(metric) # Total
avg(metric) # Average
max(metric) # Maximum
topk(5, metric) # Top 5 series
# Group by label
sum by (instance) (metric)
Variables (Templating)
{
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus"
},
{
"name": "environment",
"type": "query",
"datasource": "${datasource}",
"query": "label_values(up, environment)",
"refresh": 1,
"multi": false,
"includeAll": true
}
]
}
}
Usage in queries:
rate(http_requests_total{environment=~"$environment"}[$interval])
Alerting
{
"alert": "HighErrorRate",
"expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) > 0.05",
"for": "5m",
"labels": {
"severity": "critical"
},
"annotations": {
"summary": "High error rate detected"
}
}
Dashboard Provisioning
File Structure
grafana/
+-- provisioning/
| +-- dashboards/
| | +-- dashboards.yaml
| +-- datasources/
| +-- datasources.yaml
+-- dashboards/
+-- overview.json
Datasources Config
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
Common Dashboard Patterns
RED Method (Request, Error, Duration)
# Request Rate
sum(rate(http_requests_total[5m]))
# Error Rate
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m]))
# Duration (95th percentile)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
USE Method (Utilization, Saturation, Errors)
# CPU Utilization
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory Saturation
node_memory_SwapCached_bytes / node_memory_SwapTotal_bytes
# Network Errors
rate(node_network_receive_errs_total[5m])
Best Practices
- Use consistent colors - Red for errors, green for success
- Add descriptions - Panel descriptions explain what's shown
- Set meaningful thresholds - Color changes at important values
- Link related dashboards - Drill-down from overview to details
- Version control dashboards - Store JSON in git
- Use dashboard folders - Organize by team or service
When to Use This Skill
- Infrastructure monitoring
- Application performance monitoring
- Business metrics dashboards
- Real-time operational dashboards
- SLA/SLO tracking
- Building observability platforms
More from housegarofalo/claude-code-base
mqtt-iot
Configure MQTT brokers (Mosquitto, EMQX) for IoT messaging, device communication, and smart home integration. Manage topics, QoS levels, authentication, and bridging. Use when setting up IoT messaging, smart home communication, or device-to-cloud connectivity. (project)
22devops-engineer-agent
Infrastructure and DevOps specialist. Manages Docker, Kubernetes, CI/CD pipelines, and cloud deployments. Expert in GitHub Actions, Azure DevOps, Terraform, and container orchestration. Use for deployment automation, infrastructure setup, or CI/CD optimization.
6postgresql
Design, optimize, and manage PostgreSQL databases. Covers indexing, pgvector for AI embeddings, JSON operations, full-text search, and query optimization. Use when working with PostgreSQL, database design, or building data-intensive applications.
6home-assistant
Ultimate Home Assistant skill - complete administration, wireless protocols (Zigbee/ZHA/Z2M, Z-Wave JS, Thread, Matter), ESPHome device building, advanced troubleshooting, performance optimization, security hardening, custom integration development, and professional dashboard design. Covers configuration, REST API, automation debugging, database optimization, SSL/TLS, Jinja2 templating, and HACS custom cards. Use for any HA task.
6testing
Comprehensive testing skill covering unit, integration, and E2E testing with pytest, Jest, Cypress, and Playwright. Use for writing tests, improving coverage, debugging test failures, and setting up testing infrastructure.
5react-typescript
Build modern React applications with TypeScript. Covers React 18+ patterns, hooks, component architecture, state management (Zustand, Redux Toolkit), server components, and best practices. Use for React development, TypeScript integration, component design, and frontend architecture.
5