observability-patterns
Observability Patterns Skill
This skill provides comprehensive templates and configurations for implementing observability in Google ADK agents. Includes logging, tracing, BigQuery analytics, Cloud Trace integration, and third-party observability platforms.
Overview
Google ADK supports multiple observability approaches for monitoring, debugging, and analyzing agent behavior:
- Cloud Trace - Google Cloud native tracing with OpenTelemetry
- BigQuery Agent Analytics - Comprehensive event logging and analysis
- AgentOps - Session replays and unified tracing analytics
- Phoenix (Arize) - Open-source observability with self-hosted control
- Weave (W&B) - Weights & Biases platform for tracking and visualization
This skill covers production-ready observability implementations with security and scalability.
Available Scripts
1. Setup Cloud Trace
Script: scripts/setup-cloud-trace.sh <project-id>
Purpose: Configures Cloud Trace integration for ADK agents
Parameters:
project-id- Google Cloud project ID (required)
Usage:
# Setup Cloud Trace for local development
./scripts/setup-cloud-trace.sh my-project-id
# Setup with ADK CLI deployment
adk deploy agent_engine --project=my-project-id --trace_to_cloud ./agent
Environment Variables:
GOOGLE_CLOUD_PROJECT- Project ID for Cloud TraceGOOGLE_APPLICATION_CREDENTIALS- Path to service account key
Output: Cloud Trace enabled, traces visible in console.cloud.google.com
2. Setup BigQuery Agent Analytics
Script: scripts/setup-bigquery-analytics.sh <project-id> <dataset-id> [bucket-name]
Purpose: Configures BigQuery Agent Analytics plugin for comprehensive event logging
Parameters:
project-id- Google Cloud project ID (required)dataset-id- BigQuery dataset name (required)bucket-name- GCS bucket for multimodal content (optional)
Usage:
# Setup basic BigQuery analytics
./scripts/setup-bigquery-analytics.sh my-project agent-analytics
# Setup with GCS for multimodal content
./scripts/setup-bigquery-analytics.sh my-project agent-analytics my-content-bucket
# Create dataset and table
bq mk --dataset my-project:agent-analytics
bq mk --table agent-analytics.agent_events_v2 templates/bigquery-schema.json
IAM Requirements:
roles/bigquery.jobUser- Required for BigQuery operationsroles/bigquery.dataEditor- Required for writing dataroles/storage.objectCreator- Required if using GCS offloading
Output: BigQuery table created, events streaming to dataset
3. Setup AgentOps
Script: scripts/setup-agentops.sh
Purpose: Configures AgentOps integration for session replays and metrics
Usage:
# Install AgentOps
pip install -U agentops
# Setup with API key
AGENTOPS_API_KEY=your_api_key_here ./scripts/setup-agentops.sh
# Verify setup
python -c "import agentops; agentops.init(); print('AgentOps ready')"
Environment Variables:
AGENTOPS_API_KEY- AgentOps API key from app.agentops.ai/settings/projects
Output: AgentOps initialized, sessions visible in dashboard
4. Setup Phoenix
Script: scripts/setup-phoenix.sh
Purpose: Configures Phoenix (Arize) integration for open-source observability
Usage:
# Install Phoenix packages
pip install openinference-instrumentation-google-adk arize-phoenix-otel
# Setup Phoenix with API key
PHOENIX_API_KEY=your_key_here \
PHOENIX_COLLECTOR_ENDPOINT=https://app.phoenix.arize.com/s/your-space \
./scripts/setup-phoenix.sh
# Verify Phoenix connection
python scripts/verify-phoenix.py
Environment Variables:
PHOENIX_API_KEY- Phoenix API key from phoenix.arize.comPHOENIX_COLLECTOR_ENDPOINT- Phoenix collector endpoint URL
Output: Phoenix tracer initialized, traces visible in Phoenix dashboard
5. Setup Weave
Script: scripts/setup-weave.sh <entity> <project>
Purpose: Configures Weave (W&B) integration for observability
Parameters:
entity- W&B entity name (visible in Teams sidebar)project- W&B project name
Usage:
# Install Weave dependencies
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
# Setup Weave with API key
WANDB_API_KEY=your_wandb_key_here ./scripts/setup-weave.sh my-team my-project
# Verify Weave connection
python scripts/verify-weave.py
Environment Variables:
WANDB_API_KEY- W&B API key from wandb.ai/authorize
Output: Weave tracer initialized, traces visible in Weave dashboard
6. Validate Observability Setup
Script: scripts/validate-observability.sh
Purpose: Validates observability configuration and connectivity
Checks:
- Cloud Trace connectivity
- BigQuery dataset and table existence
- AgentOps initialization
- Phoenix endpoint reachability
- Weave endpoint reachability
- IAM permissions
- Environment variables set
Usage:
# Validate all observability configurations
./scripts/validate-observability.sh
# Validate specific tool
./scripts/validate-observability.sh --tool=bigquery
./scripts/validate-observability.sh --tool=cloud-trace
./scripts/validate-observability.sh --tool=agentops
Exit Codes:
0- All checks passed1- Configuration missing2- Connectivity failed3- Permission issues
Available Templates
1. Cloud Trace Configuration
Template: templates/cloud-trace-config.py
Purpose: Cloud Trace integration for ADK agents
Features:
- OpenTelemetry configuration
- Automatic span creation for agent runs
- LLM and tool call tracing
- Error and latency tracking
Usage:
# Enable Cloud Trace via ADK CLI
adk deploy agent_engine --project=$GOOGLE_CLOUD_PROJECT --trace_to_cloud ./agent
# Or via Python SDK
from google.adk.app import AdkApp
app = AdkApp(
agent=my_agent,
enable_tracing=True
)
Span Labels:
invocation- Top-level agent invocationagent_run- Individual agent executioncall_llm- LLM API callsexecute_tool- Tool executions
2. BigQuery Analytics Configuration
Template: templates/bigquery-analytics-config.py
Purpose: Complete BigQuery Agent Analytics plugin configuration
Features:
- Asynchronous event logging
- Multimodal content with GCS offloading
- OpenTelemetry-style tracing (trace_id, span_id)
- Event filtering and batching
- Custom content formatting
Usage:
from google.adk.plugins.bigquery_agent_analytics_plugin import (
BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig
)
bq_config = BigQueryLoggerConfig(
enabled=True,
gcs_bucket_name="your-bucket-name",
max_content_length=500 * 1024, # 500KB inline limit
batch_size=1, # Low latency
event_allowlist=["LLM_RESPONSE", "TOOL_COMPLETED"]
)
plugin = BigQueryAgentAnalyticsPlugin(
project_id="your-project-id",
dataset_id="your-dataset-id",
config=bq_config
)
app = App(root_agent=agent, plugins=[plugin])
Configuration Options:
enabled- Toggle logging on/offgcs_bucket_name- GCS bucket for large contentmax_content_length- Inline text limit (default 500KB)batch_size- Events per write (default 1)event_allowlist- Whitelist specific event typesevent_denylist- Blacklist specific event typescontent_formatter- Custom formatting function
3. BigQuery Schema
Template: templates/bigquery-schema.json
Purpose: BigQuery table schema for agent_events_v2
Schema Fields:
timestamp- Event recording timeevent_type- Event category (LLM_REQUEST, TOOL_STARTING, etc.)content- Event-specific JSON payloadcontent_parts- Structured multimodal datatrace_id- OpenTelemetry trace IDspan_id- OpenTelemetry span IDagent- Agent nameuser_id- User identifier
Partitioning: By DATE(timestamp) for cost optimization
Clustering: By event_type, agent, user_id for query performance
4. AgentOps Configuration
Template: templates/agentops-config.py
Purpose: AgentOps integration for session replays
Features:
- Minimal two-line integration
- Hierarchical span visualization
- LLM call tracking with prompts and completions
- Token count and latency metrics
- Cost tracking
Usage:
import agentops
# Initialize AgentOps (before ADK imports)
agentops.init()
# Your ADK agent code
from google.adk.app import App
app = App(root_agent=my_agent)
Span Hierarchy:
- Agent spans: Named
adk.agent.{AgentName} - LLM spans: Capture prompts, completions, tokens
- Tool spans: Record parameters and results
5. Phoenix Configuration
Template: templates/phoenix-config.py
Purpose: Phoenix (Arize) integration for open-source observability
Features:
- Self-hosted data control
- OpenInference instrumentation
- Trace evaluation
- Performance debugging
- Custom evaluators
Usage:
import os
from phoenix.otel import register
# Set Phoenix credentials
os.environ["PHOENIX_API_KEY"] = "your_api_key_here"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com/s/your-space"
# Register Phoenix tracer
tracer_provider = register(
project_name="my-adk-agent",
auto_instrument=True
)
# Your ADK agent code (Phoenix auto-captures traces)
from google.adk.app import App
app = App(root_agent=my_agent)
Auto-Instrumentation: Phoenix automatically traces all ADK operations
6. Weave Configuration
Template: templates/weave-config.py
Purpose: Weave (W&B) integration for observability
Features:
- Timeline of agent calls
- Tool invocation tracking
- Reasoning process analysis
- Span hierarchy visualization
- Dashboard integration
Usage:
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
import base64
# Setup Weave exporter
wandb_api_key = os.environ["WANDB_API_KEY"]
entity = "your-entity"
project = "your-project"
auth_string = f"api:{wandb_api_key}"
encoded_auth = base64.b64encode(auth_string.encode()).decode()
exporter = OTLPSpanExporter(
endpoint="https://trace.wandb.ai/otel/v1/traces",
headers={
"Authorization": f"Basic {encoded_auth}",
"project_id": f"{entity}/{project}"
}
)
# Configure tracer provider (BEFORE ADK imports)
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
trace.set_tracer_provider(provider)
# Your ADK agent code
from google.adk.app import App
app = App(root_agent=my_agent)
Critical: Set tracer provider before importing ADK components
Available Examples
1. Complete Observability Setup
Example: examples/complete-observability.md
Covers:
- Multi-tool observability setup
- Cloud Trace + BigQuery combination
- Third-party tool integration
- Production deployment patterns
- Cost optimization strategies
Step-by-Step Guide:
- Enable Cloud Trace for distributed tracing
- Configure BigQuery for event logging
- Add AgentOps for session replays
- Optional: Phoenix or Weave for additional insights
- Validate all configurations
- Deploy to production
Production Checklist:
- Cloud Trace enabled in production
- BigQuery dataset created with proper IAM
- GCS bucket configured for multimodal content
- Event filtering configured to control costs
- Alert rules defined for error rates
- Dashboard created for key metrics
- Retention policies set for cost control
2. BigQuery Analytics Queries
Example: examples/bigquery-queries.md
Covers:
- Conversation trace retrieval
- Token usage analysis
- Error rate tracking
- Tool usage statistics
- Performance metrics
- Cost analysis
Query Examples:
-- Retrieve conversation traces
SELECT timestamp, event_type, JSON_VALUE(content, '$.response')
FROM agent_events_v2
WHERE trace_id = 'your-trace-id'
ORDER BY timestamp ASC;
-- Token usage by agent
SELECT
agent,
AVG(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as avg_tokens,
SUM(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as total_tokens
FROM agent_events_v2
WHERE event_type = 'LLM_RESPONSE'
GROUP BY agent;
-- Error rate by event type
SELECT
event_type,
COUNT(*) as error_count,
DATE(timestamp) as day
FROM agent_events_v2
WHERE event_type LIKE '%ERROR%'
GROUP BY event_type, day
ORDER BY day DESC, error_count DESC;
-- Tool usage frequency
SELECT
JSON_VALUE(content, '$.tool_name') as tool,
COUNT(*) as usage_count
FROM agent_events_v2
WHERE event_type = 'TOOL_COMPLETED'
GROUP BY tool
ORDER BY usage_count DESC;
-- Access multimodal content from GCS
SELECT
part.mime_type,
part.object_ref.uri as gcs_uri
FROM agent_events_v2,
UNNEST(content_parts) AS part
WHERE part.storage_mode = 'GCS_REFERENCE';
3. Multi-Tool Integration
Example: examples/multi-tool-integration.md
Covers:
- Using multiple observability tools together
- Cloud Trace + BigQuery + AgentOps
- Data correlation across platforms
- Tool selection criteria
- Cost vs. insight tradeoffs
Integration Patterns:
Pattern 1: Google Cloud Native
- Cloud Trace for distributed tracing
- BigQuery for detailed event analysis
- Best for: GCP-centric deployments
Pattern 2: Comprehensive Monitoring
- Cloud Trace for infrastructure tracing
- AgentOps for session replays
- BigQuery for analytics
- Best for: Production monitoring with detailed debugging
Pattern 3: Open Source
- Phoenix for self-hosted observability
- BigQuery for long-term storage
- Best for: Data sovereignty requirements
Pattern 4: ML-Focused
- Weave for experiment tracking
- BigQuery for analytics
- Best for: Research and experimentation
4. Production Deployment
Example: examples/production-deployment.md
Covers:
- Production-ready observability configuration
- IAM role setup
- Cost optimization
- Alert configuration
- Dashboard creation
- Incident response
Production Setup:
-
IAM Configuration:
- Service account with minimal permissions
- Separate dev/staging/prod credentials
- Workload Identity for GKE deployments
-
Cost Controls:
- Event filtering to reduce BigQuery writes
- GCS lifecycle policies for multimodal content
- Table partitioning and clustering
- Retention policies (30-90 days)
-
Monitoring:
- Cloud Monitoring alerts for error rates
- BigQuery query dashboard in Looker Studio
- AgentOps session replay for debugging
- Trace analysis for performance issues
-
Security:
- No credentials in code (environment variables only)
- VPC Service Controls for data protection
- Customer-managed encryption keys (CMEK)
- Audit logging for compliance
Security Compliance
CRITICAL: This skill follows strict security rules:
ā NEVER hardcode:
- API keys (AgentOps, Phoenix, Weave, W&B)
- Google Cloud credentials
- Service account keys
- OAuth tokens
- BigQuery connection strings
ā ALWAYS:
- Use environment variables for secrets
- Generate
.env.examplewith placeholders - Add
.env*to.gitignore - Use Google Application Default Credentials
- Document credential acquisition process
- Use IAM roles instead of service account keys when possible
Placeholder format:
# .env.example
GOOGLE_CLOUD_PROJECT=your-project-id
AGENTOPS_API_KEY=your_agentops_key_here
PHOENIX_API_KEY=your_phoenix_key_here
PHOENIX_COLLECTOR_ENDPOINT=https://app.phoenix.arize.com/s/your-space
WANDB_API_KEY=your_wandb_key_here
Progressive Disclosure
This skill provides immediate setup guidance with references to detailed documentation:
- Quick Start: Use setup scripts for immediate configuration
- Production: Reference
production-deployment.mdfor complete guide - Analytics: Use
bigquery-queries.mdfor query templates - Integration: Reference
multi-tool-integration.mdfor advanced patterns
Load additional files only when specific customization is needed.
Common Workflows
1. Local Development Setup
# Enable Cloud Trace for local debugging
export GOOGLE_CLOUD_PROJECT=your-project-id
./scripts/setup-cloud-trace.sh your-project-id
# Start agent with tracing
python my_agent.py
# View traces at console.cloud.google.com/traces
2. Production Deployment with BigQuery
# 1. Create BigQuery dataset
bq mk --dataset my-project:agent-analytics
# 2. Create events table
bq mk --table agent-analytics.agent_events_v2 templates/bigquery-schema.json
# 3. Create GCS bucket for multimodal content
gsutil mb gs://my-agent-content/
# 4. Setup BigQuery analytics
./scripts/setup-bigquery-analytics.sh my-project agent-analytics my-agent-content
# 5. Deploy agent
adk deploy agent_engine --project=my-project ./agent
# 6. Validate setup
./scripts/validate-observability.sh --tool=bigquery
3. Multi-Tool Integration
# 1. Setup Cloud Trace
export GOOGLE_CLOUD_PROJECT=your-project-id
./scripts/setup-cloud-trace.sh your-project-id
# 2. Setup BigQuery Analytics
./scripts/setup-bigquery-analytics.sh your-project agent-analytics my-bucket
# 3. Setup AgentOps
export AGENTOPS_API_KEY=your_key_here
./scripts/setup-agentops.sh
# 4. Validate all configurations
./scripts/validate-observability.sh
Troubleshooting
Cloud Trace Not Showing Traces
Check:
GOOGLE_CLOUD_PROJECTenvironment variable is set- Cloud Trace API is enabled
- Service account has
roles/cloudtrace.agent - Tracer initialized before ADK imports
Debug:
# Check Cloud Trace API status
gcloud services list --enabled | grep cloudtrace
# Enable Cloud Trace API
gcloud services enable cloudtrace.googleapis.com
# Test trace export
python scripts/test-cloud-trace.py
BigQuery Events Not Appearing
Check:
- Dataset and table exist
- Service account has correct IAM roles
- BigQuery API is enabled
- Plugin configuration is correct
- No event filtering blocking events
Debug:
# Check dataset exists
bq ls my-project:
# Check table schema
bq show --schema agent-analytics.agent_events_v2
# Check IAM permissions
gcloud projects get-iam-policy my-project \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:YOUR_SA_EMAIL"
# Test plugin manually
python scripts/test-bigquery-plugin.py
AgentOps Not Capturing Traces
Check:
- AgentOps initialized before ADK imports
- API key is valid
- Network connectivity to app.agentops.ai
- AgentOps package version is latest
Fix:
# Update AgentOps
pip install -U agentops
# Test initialization
python -c "import agentops; agentops.init(); print('Success')"
# Check for conflicts with other tracers
# Ensure AgentOps is initialized first
Phoenix Connection Failed
Check:
- Phoenix API key is valid
- Collector endpoint URL is correct
- Network access to Phoenix endpoint
- Required packages installed
Debug:
# Test Phoenix endpoint
curl -H "Authorization: Bearer YOUR_KEY" \
https://app.phoenix.arize.com/s/YOUR_SPACE
# Verify package versions
pip list | grep -E "(openinference|phoenix)"
# Run verification script
python scripts/verify-phoenix.py
Weave Traces Not Appearing
Check:
- Tracer provider set BEFORE ADK imports
- W&B API key is valid
- Entity and project names are correct
- OTEL exporter configured properly
Fix:
# Verify initialization order
# 1. Import OTEL packages
# 2. Configure and set tracer provider
# 3. THEN import ADK
# Correct order:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
trace.set_tracer_provider(TracerProvider()) # FIRST
from google.adk.app import App # THEN
Dependencies
Required:
google-adk>=1.21.0- ADK framework (version 1.21.0+ for full BigQuery features)google-cloud-trace>=1.13.0- Cloud Trace client (optional)google-cloud-bigquery>=3.0.0- BigQuery client (optional)
Optional (Third-party tools):
agentops>=0.3.0- AgentOps integrationopeninference-instrumentation-google-adk>=0.1.0- Phoenix instrumentationarize-phoenix-otel>=0.1.0- Phoenix OTEL exporteropentelemetry-sdk>=1.20.0- OpenTelemetry SDK for Weaveopentelemetry-exporter-otlp-proto-http>=1.20.0- OTLP exporter for Weave
Installation:
# Core ADK with Cloud Trace
pip install google-adk google-cloud-trace
# With BigQuery Analytics
pip install google-adk google-cloud-bigquery
# With AgentOps
pip install google-adk agentops
# With Phoenix
pip install google-adk openinference-instrumentation-google-adk arize-phoenix-otel
# With Weave
pip install google-adk opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
# All observability tools
pip install google-adk google-cloud-trace google-cloud-bigquery agentops \
openinference-instrumentation-google-adk arize-phoenix-otel \
opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
Best Practices
- Multi-Layer Observability: Use Cloud Trace for infrastructure, BigQuery for analytics, and AgentOps for debugging
- Cost Control: Implement event filtering and retention policies to manage BigQuery costs
- Security: Never hardcode credentials; use environment variables and IAM roles
- Progressive Rollout: Start with Cloud Trace, add BigQuery when analytics needed
- Tool Selection: Choose tools based on requirements (open-source vs. managed, cost vs. features)
- Data Correlation: Use trace_id across all tools for unified debugging
- Alert Configuration: Set up alerts for error rates, latency spikes, and cost anomalies
- Dashboard Creation: Build custom dashboards in Looker Studio, Grafana, or tool-native UIs
Additional Resources
- Cloud Trace: https://cloud.google.com/trace/docs
- BigQuery Agent Analytics: https://google.github.io/adk-docs/observability/bigquery-agent-analytics/
- AgentOps: https://app.agentops.ai/
- Phoenix (Arize): https://arize.com/docs/phoenix/
- Weave (W&B): https://docs.wandb.ai/weave/
- ADK Observability Guide: https://google.github.io/adk-docs/observability/
- OpenTelemetry: https://opentelemetry.io/docs/
Tool Comparison
| Feature | Cloud Trace | BigQuery | AgentOps | Phoenix | Weave |
|---|---|---|---|---|---|
| Hosting | Google Cloud | Google Cloud | SaaS | SaaS/Self-hosted | SaaS |
| Cost | Free tier + usage | Storage + queries | Free tier + paid | Free tier + paid | Free tier + paid |
| Setup Complexity | Low | Medium | Very Low | Low | Medium |
| Data Control | Google Cloud | Google Cloud | Third-party | Self-host option | Third-party |
| Query Flexibility | Low | Very High | Medium | High | Medium |
| Real-time | Yes | Near real-time | Yes | Yes | Yes |
| Custom Dashboards | Limited | Full (Looker) | Built-in | Built-in | Built-in |
| Best For | Infrastructure tracing | Deep analytics | Quick debugging | Open-source, control | ML experiments |