langchain-observability
LangChain Observability
Contents
Overview
Set up comprehensive observability for LangChain applications with LangSmith, OpenTelemetry, Prometheus, and structured logging.
Prerequisites
- LangChain application in staging/production
- LangSmith account (optional but recommended)
- Prometheus/Grafana infrastructure
- OpenTelemetry collector (optional)
Instructions
Step 1: Enable LangSmith Tracing
Set LANGCHAIN_TRACING_V2=true and configure LANGCHAIN_API_KEY and LANGCHAIN_PROJECT environment variables. All chains are automatically traced.
Step 2: Add Prometheus Metrics
Create a PrometheusCallback handler that tracks langchain_llm_requests_total, langchain_llm_latency_seconds, and langchain_llm_tokens_total counters/histograms.
Step 3: Integrate OpenTelemetry
Use OTLPSpanExporter with a custom OpenTelemetryCallback to add spans for chain and LLM operations with parent-child relationships.
Step 4: Configure Structured Logging
Use structlog with a StructuredLoggingCallback to emit JSON logs for all LLM start/end/error events.
Step 5: Set Up Grafana Dashboard
Create panels for request rate, P95 latency, token usage, and error rate using Prometheus queries.
Step 6: Configure Alerting Rules
Define Prometheus alerts for high error rate (>5%), high latency (P95 >5s), and token budget exceeded.
See detailed implementation for complete callback code, dashboard JSON, and alert rules.
Output
- LangSmith tracing enabled
- Prometheus metrics exported
- OpenTelemetry spans
- Structured logging
- Grafana dashboard and alerts
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Missing metrics | Callback not attached | Pass callback to LLM constructor |
| Trace gaps | Missing context propagation | Check parent span handling |
| Alert storms | Thresholds too sensitive | Tune for duration and thresholds |
Examples
Basic usage: Apply langchain observability to a standard project setup with default configuration options.
Advanced scenario: Customize langchain observability for production environments with multiple constraints and team-specific requirements.
Resources
Next Steps
Use langchain-incident-runbook for incident response procedures.