OpenTelemetry .NET Instrumentation Skill

Description

Provides guidance for implementing OpenTelemetry instrumentation in .NET codebases, covering tracing (Activities/Spans), metrics, naming conventions, error handling, performance, and API design best practices.

When to Use

Adding OpenTelemetry instrumentation to .NET code
Creating or modifying ActivitySources and metrics
Reviewing telemetry implementations for compliance
Optimizing instrumentation performance
Designing telemetry APIs that become part of the public surface

Prerequisites

.NET application with OpenTelemetry SDK
Understanding of System.Diagnostics.Metrics and ActivitySource APIs
Access to observability backend (e.g., Jaeger, Prometheus, Grafana)

Core Principles

Resiliency First

CRITICAL: Exceptions in diagnostic/tracing/metrics logic MUST NEVER impact application processing.

Always protect against null Activity references except in Activity extension methods (use activity?.ExtensionMethod())
Assume Activity instances can be null (only created when listeners subscribe)
Guard all instrumentation code with appropriate null checks

API Surface Awareness

Any telemetry emitted becomes part of the public API surface
Changes are subject to breaking changes guidelines
Telemetry should be emitted by default (users opt-in to collection via OpenTelemetry extensions)
Exception: High-cardinality metric dimensions may require explicit opt-in

Standards Compliance

Follow Microsoft best practices for distributed tracing instrumentation
Follow OpenTelemetry semantic conventions
All attributes must be non-null, non-empty strings

Traces / Spans (Activities)

ActivitySource Setup

// ✅ CORRECT: Use ActivitySource, not DiagnosticSource
public class MyFeature
{
    // Primary ActivitySource - name typically matches the component or NuGet package name
    private static readonly ActivitySource ActivitySource = new("MyApp.MyComponent", "1.0.0");

    // Specialized ActivitySource for opt-in scenarios
    private static readonly ActivitySource DetailedActivitySource = new("MyApp.MyComponent.Detailed", "1.0.0");
}

Rules:

Every component defines a primary ActivitySource for mainstream activities
Name typically matches the component or NuGet package (e.g., "MyCompany.MyLibrary")
Version the ActivitySource using SemVer
Create separate ActivitySources for specialized/opt-in scenarios

Creating Activities

// ✅ CORRECT: Check HasListeners before creating
if (ActivitySource.HasListeners())
{
    using var activity = ActivitySource.StartActivity("ProcessItem", ActivityKind.Internal);

    if (activity != null)
    {
        activity.DisplayName = "Processing order #12345";

        // Only compute expensive tags if requested
        if (activity.IsAllDataRequested)
        {
            activity.SetTag("app.item_id", itemId);
            activity.SetTag("app.item_type", itemType);
        }
    }
}

// ❌ WRONG: Don't start activities in async helper methods (breaks AsyncLocal)
async Task HelperAsync()
{
    using var activity = ActivitySource.StartActivity("Helper"); // ❌ BAD
    await DoWorkAsync();
}

Rules:

Check ActivitySource.HasListeners() before creating (zero-allocation fast path)
Always check if activity is null after creation
Never start activities in asynchronous helper methods (Activity.Current uses AsyncLocal)
Use activity.IsAllDataRequested before expensive computations
Always use W3C ID format (enforce format change if parent uses hierarchical)

Activity Naming

// ✅ CORRECT: Unique operation name, friendly display name
using var activity = ActivitySource.StartActivity(
    name: "ProcessItem",              // Unique, identifies class of spans
    kind: ActivityKind.Internal
);
activity.DisplayName = "Processing order #12345"; // User-friendly, can be specific

// ❌ WRONG: Don't include runtime data in operation name
using var activity = ActivitySource.StartActivity($"Process_{itemId}"); // ❌ BAD

Rules:

Each span type has unique OperationName (identifies statistically interesting class of spans)
Operation name should NOT contain runtime data (only compile/config-time info)
Use human-readable DisplayName for specifics
Follow OpenTelemetry span naming conventions

Span Attributes (Tags)

// ✅ CORRECT: Namespace, lowercase, underscore-delimited
activity?.SetTag("myapp.order_id", orderId);
activity?.SetTag("myapp.order_type", orderType);
activity?.SetTag("myapp.db.table_name", tableName);

// Standard semantic conventions where applicable
activity?.SetTag("db.system", "postgresql");
activity?.SetTag("http.method", "GET");

// ❌ WRONG: Various naming violations
activity?.SetTag("MyApp.OrderId", orderId);         // ❌ Wrong case
activity?.SetTag("myapp.order-id", orderId);        // ❌ Wrong delimiter
activity?.SetTag("myapp.orders", count);            // ❌ Plural
activity?.SetTag("unrelated.ip_address", ip);       // ❌ Not characteristic

Naming Conventions:

Use a namespace prefix matching your component: myapp.*, myapp.db.*
All lowercase letters
Underscore (_) delimiters for multi-word attributes
Singular form
Only set tags directly relevant to this activity
Prefer standard OpenTelemetry semantic conventions over custom attributes where they exist
Only use standard semantic conventions if certain no downstream library will set them

Activity Status and Errors

// ✅ CORRECT: Set status and record exceptions
try
{
    await ProcessItemAsync();
    activity?.SetStatus(ActivityStatusCode.Ok);
}
catch (Exception ex)
{
    if (activity != null)
    {
        activity.SetStatus(ActivityStatusCode.Error);
        activity.SetTag("otel.status_code", "error");
        activity.SetTag("otel.status_description", ex.Message);

        // Record exception event per OTel spec
        activity.AddEvent(new ActivityEvent(
            "exception",
            tags: new ActivityTagsCollection
            {
                ["exception.type"] = ex.GetType().FullName,
                ["exception.message"] = ex.Message,
                ["exception.stacktrace"] = ex.ToString()
            }
        ));
    }
    throw;
}

Rules:

Set ActivityStatusCode.Ok on success
Set ActivityStatusCode.Error on exception
Always add otel.status_code and otel.status_description tags
Record exception events following OTel exception conventions

Activity Events

// ✅ CORRECT: Use events for additional context (sparingly)
activity?.AddEvent(new ActivityEvent("ItemRetried", tags: new ActivityTagsCollection
{
    ["retry_attempt"] = retryCount,
    ["next_retry_delay"] = delayMs
}));

// ❌ WRONG: Don't use events for verbose logging
activity?.AddEvent(new ActivityEvent($"Step {i} completed")); // ❌ Use logging instead

Rules:

Events stored in-memory until transmission (use sparingly)
Only for additional context; consider nested spans for multiple events
Use logging for verbose information

Accessing Activities

// ❌ WRONG: Don't rely on Activity.Current when you need a specific span
public async Task HandleAsync(Context context)
{
    var activity = Activity.Current; // ❌ Might be a user-created span, not yours
    activity?.SetTag("custom", "value");
}

// ✅ CORRECT: Pass Activity explicitly or store it in a dedicated context object
public async Task HandleAsync(Context context)
{
    if (context.TryGetActivity(out var activity))
    {
        activity?.SetTag("custom", "value");
    }
}

Metrics

Meter and Metrics Class Setup

// ✅ CORRECT: Group metrics by feature/component
public sealed class OrderProcessingMetrics : IDisposable
{
    private readonly Meter meter;
    private readonly Histogram<double> processingDuration;
    private readonly Counter<long> itemsProcessed;

    public OrderProcessingMetrics()
    {
        meter = new Meter("MyApp.OrderProcessing", "1.0.0");

        // Singular names, appropriate units, nested hierarchy
        processingDuration = meter.CreateHistogram<double>(
            "myapp.order.processing.duration",
            unit: "s",
            description: "Duration of order processing"
        );

        itemsProcessed = meter.CreateCounter<long>(
            "myapp.order.processing.count",
            unit: "{order}",
            description: "Number of orders processed"
        );
    }

    public void Dispose() => meter.Dispose();
}

Naming Conventions (follow OTel semantic conventions):

Singular names (use _count suffix instead of pluralization)
Nested hierarchy: myapp.order.processing.duration
Define units (s, ms, {item}, {connection})
Avoid technical suffixes (_counter, _histogram)
Start with pre-1.0.0 version until adoption proven

Metric Recording Method Naming

// ✅ CORRECT: Action/outcome-based naming, separate methods per outcome
public sealed class OrderProcessingMetrics
{
    // Event happened: describe what occurred
    public void OrderProcessingSucceeded(string orderType, TimeSpan duration)
    {
        processingDuration.Record(duration.TotalSeconds,
            new KeyValuePair<string, object?>("myapp.order_type", orderType),
            new KeyValuePair<string, object?>("outcome", "success")
        );
    }

    public void OrderProcessingFailed(string orderType, Exception exception, TimeSpan duration)
    {
        processingDuration.Record(duration.TotalSeconds,
            new KeyValuePair<string, object?>("myapp.order_type", orderType),
            new KeyValuePair<string, object?>("outcome", "failure"),
            new KeyValuePair<string, object?>("exception.type", exception.GetType().Name)
        );
    }

    public void ConnectionOpened() => connectionsOpen.Add(1);
    public void ConnectionClosed() => connectionsOpen.Add(-1);
}

// ❌ WRONG: Various naming anti-patterns
public void RecordOrderProcessingDuration(...) { } // ❌ Don't name after metric
public void RecordError(bool succeeded, Exception? ex) { } // ❌ Confusing signature

Rules (inspired by ASP.NET Core patterns):

Name after action/outcome: OrderProcessingSucceeded, RetryAttempted, ConnectionFailed
NOT after metric name: avoid RecordXxx, IncrementXxx
Separate methods for different outcomes (avoid boolean flags + optional exceptions)
Event-based naming for state changes: ConnectionOpened(), ItemQueued()

Metric Dimensions

// ✅ CORRECT: Low-cardinality, predefined dimensions
public void OrderProcessingSucceeded(string orderType, TimeSpan duration)
{
    processingDuration.Record(duration.TotalSeconds,
        new KeyValuePair<string, object?>("myapp.order_type", orderType),
        new KeyValuePair<string, object?>("myapp.region", region),
        new KeyValuePair<string, object?>("outcome", "success")
    );
}

// ❌ WRONG: High-cardinality dimensions (unbounded values cause cardinality explosion)
public void OrderFailed(string orderId, string exceptionMessage)
{
    failureCount.Add(1,
        new KeyValuePair<string, object?>("order_id", orderId),               // ❌ Unbounded
        new KeyValuePair<string, object?>("exception_message", exceptionMessage) // ❌ Unbounded
    );
}

Rules:

Dimensions MUST be predefined at instrument creation
Avoid dynamic/unbounded values (causes cardinality explosion: each unique value creates a new time series row)
High-cardinality dimensions MUST be opt-in configuration
Use low-cardinality identifiers: item type, queue name, outcome
Consistent dimension names across components: myapp.region means same thing everywhere
Avoid sensitive data
Consider metric enrichment alternatives
Users can enable metric exemplars for correlation (not through dimensions)

Performance Requirements

Instrumentation MUST be cheap by default. Follow these rules to minimize overhead:

Zero-Allocation Fast Path

// ✅ CORRECT: Guard with cheap checks
if (ActivitySource.HasListeners())
{
    using var activity = ActivitySource.StartActivity("Operation");
    // ... expensive work
}

// ✅ CORRECT: Use TagList (struct) for metrics
var tags = new TagList
{
    { "myapp.order_type", orderType },
    { "outcome", "success" }
};
counter.Add(1, tags);

Timing

// ✅ CORRECT: Timestamp math (no allocation)
var startTime = Stopwatch.GetTimestamp();
try
{
    await ProcessAsync();
}
finally
{
    var duration = Stopwatch.GetElapsedTime(startTime);
    metrics.OrderProcessingSucceeded(orderType, duration);
}

// ❌ WRONG: Allocates Stopwatch object
var stopwatch = Stopwatch.StartNew(); // ❌ Allocates

// ❌ WRONG: IDisposable timing class (allocates per use)
using (new MetricScope(metrics, "ProcessOrder")) // ❌ BAD
{
    ProcessOrder();
}

Avoid Hidden Allocations

// ❌ WRONG: String interpolation allocates
activity?.SetTag("item", $"Processing {itemId}"); // ❌ Allocates

// ✅ CORRECT: Check IsAllDataRequested first
if (activity?.IsAllDataRequested == true)
{
    activity.SetTag("item", $"Processing {itemId}");
}

// ❌ WRONG: LINQ allocates enumerators
activity?.SetTag("handlers", handlers.Select(h => h.Name).ToArray()); // ❌ Bad

// ✅ CORRECT: Manual construction or check first
if (activity?.IsAllDataRequested == true)
{
    activity.SetTag("handlers", string.Join(",", handlers.Select(h => h.Name)));
}

Rules:

No Stopwatch.StartNew() (use timestamp math)
No timing IDisposable wrappers as classes
Prefer TagList (struct) over arrays/dictionaries
No hidden work: avoid LINQ, string interpolation, async state machines in hot paths

Testing Requirements

Span Tests

[Test]
public async Task Should_create_processing_span_with_correct_parent()
{
    // Arrange
    using var parent = new Activity("Parent").Start();

    // Act
    await handler.Handle(item);

    // Assert
    var processingSpan = recordedActivities.Single(a => a.OperationName == "ProcessItem");
    Assert.AreEqual(parent.Id, processingSpan.ParentId);
    Assert.AreEqual("myapp.item_type", processingSpan.Tags.First().Key);
}

[Test]
public void Should_not_introduce_breaking_changes_to_span_names()
{
    // Ensures string values in span names are under test
    Assert.AreEqual("ProcessItem", MyFeature.SpanName);
}

Rules:

Test which spans activities connect to
Test string values (span names, tag names) to prevent breaking changes
Remember: telemetry is part of public API

Versioning

Telemetry versioning decoupled from package version
Use SemVer semantics
Traces and Metrics use separate versions (evolve independently)
Start with pre-1.0.0 version until adoption/usefulness proven

private static readonly ActivitySource ActivitySource = new("MyApp.MyComponent", "0.9.0");
private readonly Meter meter = new("MyApp.MyComponent", "0.8.0");

OpenTelemetry-NET-Instrumentation

OpenTelemetry .NET Instrumentation Skill

Description

When to Use

Prerequisites

Core Principles

Resiliency First

API Surface Awareness

Standards Compliance

Traces / Spans (Activities)

ActivitySource Setup

Creating Activities

Activity Naming

Span Attributes (Tags)

Activity Status and Errors

Activity Events

Accessing Activities

Metrics

Meter and Metrics Class Setup

Metric Recording Method Naming

Metric Dimensions

Performance Requirements

Zero-Allocation Fast Path

Timing

Avoid Hidden Allocations

Testing Requirements

Span Tests

Versioning

References