OpenTelemetry Instrumentation Extension
OpenTelemetry Instrumentation Extension
Automatically extend OpenTelemetry instrumentation for new functionality in the MCP Gateway, following established patterns documented in docs/telemetry/README.md.
When to Use This Skill
Automatically apply when:
- New state-changing operations added (Create, Update, Delete, Push, Pull, Add, Remove, etc.)
- New CLI commands added to
cmd/docker-mcp/ - New packages with operations in
pkg/ - User mentions "otel", "telemetry", "instrumentation", "metrics", or "tracing"
- Code changes that modify state (database, files, containers, configuration)
- Reviewing code for telemetry coverage
Workflow
Phase 1: Analysis & Suggestion
-
Read project telemetry standards:
- Read
docs/telemetry/README.md"Development Guidelines" section - Read
pkg/telemetry/telemetry.goto understand existing patterns and metrics
- Read
-
Identify scope using git diff:
- Find new/changed files in
pkg/andcmd/docker-mcp/ - Identify functions performing state-changing operations
- Infer domain from package structure (e.g.,
pkg/foo/→ domain:foo)
- Find new/changed files in
-
Categorize findings:
- Operations in existing domains (use existing metrics)
- Operations in new domains (need new metrics)
-
Present suggestions to user:
- List each function needing instrumentation with file:line reference
- Specify operation type (create, update, delete, etc.)
- Identify domain and whether metrics exist
- Show what will be added (instrumentation code, metrics, docs, tests)
Phase 2: Implementation (After User Approval)
Execute in this order:
- Make changes: Instrument operations and add metrics to
pkg/telemetry/telemetry.go - Verify: Build, run with OTEL collector, check
docker logs otel-debugoutput - Write tests: Add tests to
pkg/telemetry/telemetry_test.go - Update docs: Add metrics/operations to
docs/telemetry/README.md - Run tests: Execute
make test - Final verification: Run
./docs/telemetry/testing/test-telemetry.sh
Key Principles
Follow the project's documented guidelines from docs/telemetry/README.md:
- Use Existing Providers: Get global tracer/meter from OTEL
- Preserve Server Lineage: Include server attribution in all telemetry
- Non-Blocking Operations: Telemetry never blocks or fails operations
- Debug Support: Add logging behind
DOCKER_MCP_TELEMETRY_DEBUG - Follow Naming Conventions: Use
mcp.<domain>.<field>pattern - Deferred Success Tracking: Only set success on clean completion
Where to Find Information
ALWAYS read these files before suggesting changes:
-
docs/telemetry/README.md- Source of truth for:- Development guidelines (lines 419-474)
- Existing metrics and attributes
- Testing procedures
- Naming conventions
-
pkg/telemetry/telemetry.go- Understand:- Existing metric instruments
- Recording function patterns
- Helper functions for spans
- Init() structure
-
pkg/telemetry/telemetry_test.go- See:- Testing patterns
- How to verify metrics
Instrumentation Pattern
Use the simple defer pattern from cmd/docker-mcp/catalog/create.go:
func OperationName(ctx context.Context, identifier string, ...) error {
telemetry.Init()
start := time.Now()
var success bool
defer func() {
duration := time.Since(start)
telemetry.Record<Domain>Operation(ctx, "operation_name", identifier,
float64(duration.Milliseconds()), success)
}()
// ... operation logic ...
// Optional: Record resource counts
telemetry.Record<Domain><Resources>(ctx, identifier, int64(count))
success = true
return nil
}
Note: If identifier is generated during execution, the defer captures its final value.
Adding New Domains
When instrumentation is needed for a new domain (e.g., new package pkg/newdomain/):
1. Add Metric Instruments in pkg/telemetry/telemetry.go
In the Init() function, add global variables and create metric instruments:
var (
// ... existing metrics ...
// New domain metrics
newdomainOperations metric.Int64Counter
newdomainOperationDuration metric.Float64Histogram
newdomainResources metric.Int64Gauge // If managing resources
)
func Init() {
// ... existing init code ...
newdomainOperations, _ = meter.Int64Counter(
"mcp.newdomain.operations",
metric.WithDescription("New domain operations count"),
)
newdomainOperationDuration, _ = meter.Float64Histogram(
"mcp.newdomain.operation.duration",
metric.WithDescription("New domain operation duration in milliseconds"),
)
newdomainResources, _ = meter.Int64Gauge(
"mcp.newdomain.resources",
metric.WithDescription("Number of resources in new domain"),
)
}
2. Add Recording Functions in pkg/telemetry/telemetry.go
After the Init() function, add recording functions:
func RecordNewdomainOperation(ctx context.Context, operation, identifier string, durationMs float64, success bool) {
if newdomainOperations == nil || newdomainOperationDuration == nil {
return
}
attrs := []attribute.KeyValue{
attribute.String("mcp.newdomain.operation", operation),
attribute.String("mcp.newdomain.id", identifier), // or .name, .ref as appropriate
attribute.Bool("mcp.newdomain.success", success),
}
newdomainOperations.Add(ctx, 1, metric.WithAttributes(attrs...))
newdomainOperationDuration.Record(ctx, durationMs, metric.WithAttributes(attrs...))
}
// Optional: Add resource counting function if applicable
func RecordNewdomainResources(ctx context.Context, identifier string, count int64) {
if newdomainResources == nil {
return
}
attrs := []attribute.KeyValue{
attribute.String("mcp.newdomain.id", identifier),
}
newdomainResources.Record(ctx, count, metric.WithAttributes(attrs...))
}
3. Write Tests in pkg/telemetry/telemetry_test.go
Add test cases following existing patterns:
func TestRecordNewdomainOperation(t *testing.T) {
spanRecorder, metricReader := setupTestTelemetry(t)
Init()
ctx := context.Background()
// Test successful operation
RecordNewdomainOperation(ctx, "create", "test-id", 123.45, true)
// Collect and verify metrics
var rm metricdata.ResourceMetrics
err := metricReader.Collect(ctx, &rm)
require.NoError(t, err)
// Find and verify the counter metric
foundCounter := false
foundHistogram := false
for _, sm := range rm.ScopeMetrics {
for _, m := range sm.Metrics {
if m.Name == "mcp.newdomain.operations" {
foundCounter = true
sum := m.Data.(metricdata.Sum[int64])
require.Len(t, sum.DataPoints, 1)
assert.Equal(t, int64(1), sum.DataPoints[0].Value)
// Verify attributes
attrs := sum.DataPoints[0].Attributes
assert.Contains(t, attrs.ToSlice(), attribute.String("mcp.newdomain.operation", "create"))
assert.Contains(t, attrs.ToSlice(), attribute.String("mcp.newdomain.id", "test-id"))
assert.Contains(t, attrs.ToSlice(), attribute.Bool("mcp.newdomain.success", true))
}
if m.Name == "mcp.newdomain.operation.duration" {
foundHistogram = true
histogram := m.Data.(metricdata.Histogram[float64])
require.Len(t, histogram.DataPoints, 1)
assert.Equal(t, float64(123.45), histogram.DataPoints[0].Sum)
}
}
}
assert.True(t, foundCounter, "Counter metric not found")
assert.True(t, foundHistogram, "Histogram metric not found")
}
4. Update Documentation in docs/telemetry/README.md
Add a new section for the domain in the appropriate location. Follow the existing format:
### New Domain Operations
Operations for managing [description of what this domain does]:
- **`mcp.newdomain.operations`** - New domain operations (create, update, delete, etc.)
- **`mcp.newdomain.operation.duration`** - Duration of new domain operations
- **`mcp.newdomain.resources`** - Gauge showing number of resources in new domain
#### New Domain Attributes
- **`mcp.newdomain.operation`** - Type of operation (create, update, delete, etc.)
- **`mcp.newdomain.id`** - ID of the resource
- **`mcp.newdomain.success`** - Boolean indicating operation success
Verification
After making changes, verify telemetry output:
# Build
make docker-mcp
# Start OTEL collector
docker run --rm -d --name otel-debug \
-p 4317:4317 -p 4318:4318 \
-v $(pwd)/docs/telemetry/testing/otel-collector-config.yaml:/config.yaml \
otel/opentelemetry-collector:latest --config=/config.yaml
# Run with telemetry enabled
export DOCKER_MCP_TELEMETRY_DEBUG=1
export DOCKER_CLI_OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
docker mcp [command]
# Check collector output
docker logs otel-debug | grep "mcp.newdomain"
# Cleanup
docker stop otel-debug
Verify the output matches expectations before proceeding to write tests and docs.
Analysis Strategy
When analyzing git diff:
-
Look for operation verbs in function names:
- Create, Add, Update, Modify, Set, Configure, Register
- Delete, Remove, Unregister, Clear
- Push, Pull, Export, Import, Sync
- Start, Stop, Run, Execute
-
Look for state changes:
- Database operations (DAO/DB calls)
- File system operations (create, delete, write)
- Container operations (start, stop, create, delete)
- Configuration changes (save, update)
-
Infer domain from file path:
pkg/workingset/→ domain:profilepkg/catalog_next/→ domain:catalog_nextcmd/docker-mcp/server/→ domain:server- Pattern: use logical grouping name
-
Check if telemetry exists:
- Search
pkg/telemetry/telemetry.goforRecord<Domain>functions - If exists: use existing metrics
- If not: propose new domain metrics
- Search
Implementation Checklist
Follow this sequence:
- Read patterns from
docs/telemetry/README.md - Instrument operations with simple defer pattern
- Add new metrics to
pkg/telemetry/telemetry.go(if new domain) - Verify with
docker logs otel-debug- confirm output matches expectations - Write tests in
pkg/telemetry/telemetry_test.go - Update
docs/telemetry/README.mdwith new metrics - Run
make test- verify tests pass - Run
./docs/telemetry/testing/test-telemetry.sh- final verification
Important Notes
- Read documentation first
- Follow existing patterns
- Ask for approval before implementing
- Verify early and often with collector logs