groq-observability
SKILL.md
Groq Observability
Overview
Set up comprehensive observability for Groq integrations.
Prerequisites
- Prometheus or compatible metrics backend
- OpenTelemetry SDK installed
- Grafana or similar dashboarding tool
- AlertManager configured
Metrics Collection
Key Metrics
| Metric | Type | Description |
|---|---|---|
groq_requests_total |
Counter | Total API requests |
groq_request_duration_seconds |
Histogram | Request latency |
groq_errors_total |
Counter | Error count by type |
groq_rate_limit_remaining |
Gauge | Rate limit headroom |
Prometheus Metrics
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
const requestCounter = new Counter({
name: 'groq_requests_total',
help: 'Total Groq API requests',
labelNames: ['method', 'status'],
registers: [registry],
});
const requestDuration = new Histogram({
name: 'groq_request_duration_seconds',
help: 'Groq request duration',
labelNames: ['method'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry],
});
const errorCounter = new Counter({
name: 'groq_errors_total',
help: 'Groq errors by type',
labelNames: ['error_type'],
registers: [registry],
});
Instrumented Client
async function instrumentedRequest<T>(
method: string,
operation: () => Promise<T>
): Promise<T> {
const timer = requestDuration.startTimer({ method });
try {
const result = await operation();
requestCounter.inc({ method, status: 'success' });
return result;
} catch (error: any) {
requestCounter.inc({ method, status: 'error' });
errorCounter.inc({ error_type: error.code || 'unknown' });
throw error;
} finally {
timer();
}
}
Distributed Tracing
OpenTelemetry Setup
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('groq-client');
async function tracedGroqCall<T>(
operationName: string,
operation: () => Promise<T>
): Promise<T> {
return tracer.startActiveSpan(`groq.${operationName}`, async (span) => {
try {
const result = await operation();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
Logging Strategy
Structured Logging
import pino from 'pino';
const logger = pino({
name: 'groq',
level: process.env.LOG_LEVEL || 'info',
});
function logGroqOperation(
operation: string,
data: Record<string, any>,
duration: number
) {
logger.info({
service: 'groq',
operation,
duration_ms: duration,
...data,
});
}
Alert Configuration
Prometheus AlertManager Rules
# groq_alerts.yaml
groups:
- name: groq_alerts
rules:
- alert: GroqHighErrorRate
expr: |
rate(groq_errors_total[5m]) /
rate(groq_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Groq error rate > 5%"
- alert: GroqHighLatency
expr: |
histogram_quantile(0.95,
rate(groq_request_duration_seconds_bucket[5m])
) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Groq P95 latency > 2s"
- alert: GroqDown
expr: up{job="groq"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Groq integration is down"
Dashboard
Grafana Panel Queries
{
"panels": [
{
"title": "Groq Request Rate",
"targets": [{
"expr": "rate(groq_requests_total[5m])"
}]
},
{
"title": "Groq Latency P50/P95/P99",
"targets": [{
"expr": "histogram_quantile(0.5, rate(groq_request_duration_seconds_bucket[5m]))"
}]
}
]
}
Instructions
Step 1: Set Up Metrics Collection
Implement Prometheus counters, histograms, and gauges for key operations.
Step 2: Add Distributed Tracing
Integrate OpenTelemetry for end-to-end request tracing.
Step 3: Configure Structured Logging
Set up JSON logging with consistent field names.
Step 4: Create Alert Rules
Define Prometheus alerting rules for error rates and latency.
Output
- Metrics collection enabled
- Distributed tracing configured
- Structured logging implemented
- Alert rules deployed
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Missing metrics | No instrumentation | Wrap client calls |
| Trace gaps | Missing propagation | Check context headers |
| Alert storms | Wrong thresholds | Tune alert rules |
| High cardinality | Too many labels | Reduce label values |
Examples
Quick Metrics Endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});
Resources
Next Steps
For incident response, see groq-incident-runbook.
Weekly Installs
2
Installed on
antigravity2
kilo1
windsurf1
zencoder1
cline1
pi1