skills/wshaddix/dotnet-skills/dotnet-observability

dotnet-observability

SKILL.md

dotnet-observability

Modern observability for .NET applications using OpenTelemetry, structured logging, health checks, and custom metrics. Covers the three pillars of observability (traces, metrics, logs), integration with Microsoft.Extensions.Diagnostics and System.Diagnostics, and production-ready health check patterns.

Out of scope: DI container mechanics and service lifetimes -- see [skill:dotnet-csharp-dependency-injection]. Async/await patterns -- see [skill:dotnet-csharp-async-patterns]. Testing observability output -- see [skill:dotnet-integration-testing] for verifying telemetry in integration tests. CI/CD pipeline integration for telemetry collection -- see [skill:dotnet-gha-patterns] and [skill:dotnet-ado-patterns]. Middleware pipeline patterns (request logging middleware, exception handling middleware) -- see [skill:dotnet-middleware-patterns].

Cross-references: [skill:dotnet-csharp-dependency-injection] for service registration, [skill:dotnet-csharp-async-patterns] for async patterns in background exporters, [skill:dotnet-resilience] for Polly telemetry integration, [skill:dotnet-middleware-patterns] for request/exception logging middleware.


OpenTelemetry Setup

OpenTelemetry is the standard observability framework in .NET. The .NET SDK includes native support for System.Diagnostics.Activity (traces) and System.Diagnostics.Metrics (metrics), which OpenTelemetry collects and exports.

Package Landscape

Package Purpose
OpenTelemetry.Extensions.Hosting Host integration, lifecycle management
OpenTelemetry.Instrumentation.AspNetCore Automatic HTTP server trace/metric instrumentation
OpenTelemetry.Instrumentation.Http Automatic HttpClient trace/metric instrumentation
OpenTelemetry.Instrumentation.Runtime GC, thread pool, assembly metrics
OpenTelemetry.Exporter.OpenTelemetryProtocol OTLP exporter (gRPC/HTTP) for collectors
OpenTelemetry.Exporter.Console Console exporter for local development

Install the core stack:

<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.*" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.*" />
<PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.*" />
<PackageReference Include="OpenTelemetry.Instrumentation.Runtime" Version="1.*" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.*" />

Aspire Service Defaults Integration

If using .NET Aspire, the ServiceDefaults project configures OpenTelemetry automatically. This is the recommended approach for Aspire apps -- do not duplicate this configuration manually:

// ServiceDefaults/Extensions.cs (generated by Aspire)
public static IHostApplicationBuilder AddServiceDefaults(
    this IHostApplicationBuilder builder)
{
    builder.ConfigureOpenTelemetry();
    builder.AddDefaultHealthChecks();
    // ... other defaults
    return builder;
}

For non-Aspire apps, configure OpenTelemetry explicitly as shown below.

Full Configuration (Non-Aspire)

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(
            serviceName: builder.Environment.ApplicationName,
            serviceVersion: typeof(Program).Assembly
                .GetCustomAttribute<AssemblyInformationalVersionAttribute>()
                ?.InformationalVersion ?? "unknown"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("MyApp.*")          // Custom ActivitySources
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddMeter("MyApp.*")           // Custom Meters
        .AddOtlpExporter());

OTLP Configuration via Environment Variables

The OTLP exporter reads standard environment variables -- no code changes needed between environments:

# Collector endpoint (gRPC default)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Or HTTP/protobuf
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Resource attributes
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.namespace=myapp

# Service name (overrides code-based configuration)
OTEL_SERVICE_NAME=order-api

Distributed Tracing

How .NET Tracing Works

.NET uses System.Diagnostics.Activity as its native tracing primitive. OpenTelemetry maps these to spans:

.NET Concept OpenTelemetry Concept
ActivitySource Tracer
Activity Span
Activity.SetTag Span attribute
Activity.AddEvent Span event
Activity.SetStatus Span status

Custom Traces

public sealed class OrderService
{
    // One ActivitySource per logical component, named after the namespace
    private static readonly ActivitySource s_activitySource = new("MyApp.Orders");

    public async Task<Order> CreateOrderAsync(
        CreateOrderRequest request,
        CancellationToken ct)
    {
        using var activity = s_activitySource.StartActivity(
            "CreateOrder",
            ActivityKind.Internal);

        activity?.SetTag("order.customer_id", request.CustomerId);
        activity?.SetTag("order.line_count", request.Lines.Count);

        var order = new Order { /* ... */ };

        activity?.AddEvent(new ActivityEvent("OrderValidated"));

        await _db.Orders.AddAsync(order, ct);
        await _db.SaveChangesAsync(ct);

        activity?.SetTag("order.id", order.Id);
        activity?.SetStatus(ActivityStatusCode.Ok);

        return order;
    }
}

Trace Context Propagation

W3C Trace Context is the default propagation format in .NET. It works automatically across HTTP boundaries with HttpClient:

// Trace context is automatically propagated via traceparent/tracestate headers
// when using HttpClient with OpenTelemetry.Instrumentation.Http.
// No manual propagation needed for HTTP-based communication.

For message-based communication (queues, event buses), propagate context explicitly:

// Producer: inject context into message headers
var propagator = Propagators.DefaultTextMapPropagator;
var carrier = new Dictionary<string, string>();
var currentActivity = Activity.Current;
if (currentActivity is not null)
{
    propagator.Inject(
        new PropagationContext(currentActivity.Context, Baggage.Current),
        carrier,
        (dict, key, value) => dict[key] = value);
}
// Attach carrier as message headers

// Consumer: extract context from message headers
var parentContext = propagator.Extract(
    default,
    messageHeaders,
    (headers, key) => headers.TryGetValue(key, out var value)
        ? [value] : []);

using var activity = s_activitySource.StartActivity(
    "ProcessMessage",
    ActivityKind.Consumer,
    parentContext.ActivityContext);

Metrics

Built-in Metrics

ASP.NET Core and HttpClient emit metrics automatically when OpenTelemetry instrumentation is configured:

Meter Key Metrics
Microsoft.AspNetCore.Hosting http.server.request.duration, http.server.active_requests
Microsoft.AspNetCore.Routing aspnetcore.routing.match_attempts
System.Net.Http http.client.request.duration, http.client.active_requests
System.Runtime process.runtime.dotnet.gc.collections.count, process.runtime.dotnet.threadpool.threads.count

Custom Metrics

Use System.Diagnostics.Metrics for application-specific metrics:

public sealed class OrderMetrics
{
    // One Meter per logical component
    private readonly Counter<long> _ordersCreated;
    private readonly Histogram<double> _orderProcessingDuration;
    private readonly UpDownCounter<long> _activeOrders;

    public OrderMetrics(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("MyApp.Orders");

        _ordersCreated = meter.CreateCounter<long>(
            "myapp.orders.created",
            unit: "{order}",
            description: "Number of orders created");

        _orderProcessingDuration = meter.CreateHistogram<double>(
            "myapp.orders.processing_duration",
            unit: "s",
            description: "Time to process an order");

        _activeOrders = meter.CreateUpDownCounter<long>(
            "myapp.orders.active",
            unit: "{order}",
            description: "Number of orders currently being processed");
    }

    public void RecordOrderCreated(string region)
    {
        _ordersCreated.Add(1, new KeyValuePair<string, object?>("region", region));
    }

    public void RecordProcessingDuration(double seconds)
    {
        _orderProcessingDuration.Record(seconds);
    }

    public void IncrementActiveOrders() => _activeOrders.Add(1);
    public void DecrementActiveOrders() => _activeOrders.Add(-1);
}

Register the metrics class in DI:

builder.Services.AddSingleton<OrderMetrics>();

Metric Naming Conventions

Follow the OpenTelemetry semantic conventions:

  • Use lowercase with dots as separators: myapp.orders.created
  • Use standard units from the spec: s (seconds), ms (milliseconds), By (bytes), {request} (dimensionless)
  • Prefix with your application/service name: myapp.*
  • Use consistent tag names across metrics: region, status, order.type

Structured Logging

Microsoft.Extensions.Logging (Built-in)

The built-in logging framework supports structured logging natively. Use compile-time source generators for high-performance logging:

public static partial class Log
{
    [LoggerMessage(
        Level = LogLevel.Information,
        Message = "Order {OrderId} created for customer {CustomerId} with {LineCount} items, total {Total:C}")]
    public static partial void OrderCreated(
        this ILogger logger,
        string orderId,
        string customerId,
        int lineCount,
        decimal total);

    [LoggerMessage(
        Level = LogLevel.Warning,
        Message = "Order {OrderId} processing exceeded threshold: {Duration}ms")]
    public static partial void OrderProcessingSlow(
        this ILogger logger,
        string orderId,
        double duration);

    [LoggerMessage(
        Level = LogLevel.Error,
        Message = "Failed to process order {OrderId}")]
    public static partial void OrderProcessingFailed(
        this ILogger logger,
        Exception exception,
        string orderId);
}

// Usage
logger.OrderCreated(order.Id, order.CustomerId, order.Lines.Count, order.Total);

Why Source-Generated Logging

  • Zero allocation for disabled log levels (checked at call site)
  • Compile-time validation of message templates and parameters
  • Structured by default -- parameters become named properties in the log event

LoggerMessage.Define (Legacy / Pre-.NET 6)

Before source generators (.NET 5 and earlier), use LoggerMessage.Define to achieve the same zero-allocation benefits. This approach still works in modern .NET and is useful in non-partial classes or when targeting older frameworks:

public static class LogMessages
{
    private static readonly Action<ILogger, string, int, Exception?> s_orderCreated =
        LoggerMessage.Define<string, int>(
            LogLevel.Information,
            new EventId(1, nameof(OrderCreated)),
            "Order {OrderId} created with {LineCount} items");

    public static void OrderCreated(
        ILogger logger, string orderId, int lineCount)
        => s_orderCreated(logger, orderId, lineCount, null);

    private static readonly Action<ILogger, string, Exception?> s_orderFailed =
        LoggerMessage.Define<string>(
            LogLevel.Error,
            new EventId(2, nameof(OrderFailed)),
            "Failed to process order {OrderId}");

    public static void OrderFailed(
        ILogger logger, string orderId, Exception exception)
        => s_orderFailed(logger, orderId, exception);
}

Prefer [LoggerMessage] source generators for new code targeting .NET 6+. Use LoggerMessage.Define only when source generators are unavailable.

Message Templates: Do and Do Not

Message templates use named placeholders that become structured properties. This is fundamental to structured logging -- violations prevent log indexing and search.

// CORRECT: structured message template with named placeholders
logger.LogInformation("Order {OrderId} shipped to {City}", orderId, city);

// WRONG: string interpolation -- bypasses structured logging entirely
logger.LogInformation($"Order {orderId} shipped to {city}");

// WRONG: string concatenation -- same problem
logger.LogInformation("Order " + orderId + " shipped to " + city);

// WRONG: ToString() in template -- loses type information
logger.LogInformation("Order {OrderId} shipped at {Time}",
    orderId, DateTime.UtcNow.ToString("o")); // pass DateTime directly

// CORRECT: pass objects directly, let the formatter handle rendering
logger.LogInformation("Order {OrderId} shipped at {ShippedAt}",
    orderId, DateTime.UtcNow);

Log Level Best Practices

Level When to Use Example
Trace Detailed diagnostic info (method entry/exit, variable values) Entering ProcessOrder with {OrderId}
Debug Internal app state useful during development Cache hit for product {ProductId}
Information Normal application flow, business events Order {OrderId} created successfully
Warning Unexpected situations that do not prevent operation Retry {Attempt} for external API call
Error Failures that affect the current operation Failed to save order {OrderId}
Critical Application-wide failures requiring immediate action Database connection pool exhausted

Log Filtering (Microsoft.Extensions.Logging)

Configure log level filtering in appsettings.json to suppress noisy framework logs while keeping application logs at the desired level:

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning",
      "Microsoft.AspNetCore.HttpLogging": "Information",
      "Microsoft.EntityFrameworkCore.Database.Command": "Warning",
      "System.Net.Http.HttpClient": "Warning",
      "MyApp": "Debug"
    },
    "Console": {
      "LogLevel": {
        "Default": "Warning"
      }
    }
  }
}

Key filtering rules:

  • Most-specific category wins -- MyApp.Orders matches MyApp if no more specific override exists
  • Provider-level overrides -- the Console section above overrides the default for the console provider only
  • Environment overrides -- use appsettings.Development.json to enable Debug/Trace locally without affecting production

Log Scopes for Correlation

public async Task<Order> ProcessOrderAsync(
    string orderId,
    CancellationToken ct)
{
    using var scope = _logger.BeginScope(
        new Dictionary<string, object>
        {
            ["OrderId"] = orderId,
            ["CorrelationId"] = Activity.Current?.TraceId.ToString() ?? ""
        });

    // All log messages within this scope include OrderId and CorrelationId
    _logger.LogInformation("Starting order processing");
    // ...
}

Serilog Integration

For advanced sinks (Elasticsearch, Seq, Datadog), Serilog is the standard structured logging library.

Package Purpose
Serilog.AspNetCore UseSerilog() host integration + UseSerilogRequestLogging()
Serilog.Settings.Configuration ReadFrom.Configuration() for appsettings.json binding
Serilog.Sinks.OpenTelemetry WriteTo.OpenTelemetry() OTLP sink
Serilog.Formatting.Compact RenderedCompactJsonFormatter for structured console output
Serilog.Enrichers.Environment Enrich.WithMachineName() and Enrich.WithEnvironmentName()
// Program.cs
builder.Host.UseSerilog((context, loggerConfiguration) =>
{
    loggerConfiguration
        .ReadFrom.Configuration(context.Configuration)
        .Enrich.FromLogContext()
        .Enrich.WithMachineName()
        .Enrich.WithEnvironmentName()
        .WriteTo.Console(new RenderedCompactJsonFormatter())
        .WriteTo.OpenTelemetry(options =>
        {
            options.Endpoint = context.Configuration["OTEL_EXPORTER_OTLP_ENDPOINT"]
                ?? "http://localhost:4317";
            options.Protocol = OtlpProtocol.Grpc;
        });
});

// Use Serilog request logging instead of the built-in one
app.UseSerilogRequestLogging(options =>
{
    options.EnrichDiagnosticContext = (diagnosticContext, httpContext) =>
    {
        diagnosticContext.Set("RequestHost", httpContext.Request.Host.Value);
        diagnosticContext.Set("UserAgent", httpContext.Request.Headers.UserAgent.ToString());
    };
});

Configure via appsettings.json:

{
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft.AspNetCore": "Warning",
        "Microsoft.EntityFrameworkCore.Database.Command": "Warning",
        "System.Net.Http.HttpClient": "Warning"
      }
    }
  }
}

Choosing Between MS.Extensions.Logging and Serilog

Scenario Recommendation
Console + OTLP export only Microsoft.Extensions.Logging + OpenTelemetry exporter
Need Elasticsearch, Seq, or Datadog sinks Serilog
.NET Aspire application Use the built-in logging (Aspire configures OTLP automatically)
High-throughput, minimal allocation Source-generated LoggerMessage (works with both)

Health Checks

Health checks enable orchestrators (Kubernetes, Docker, load balancers) to determine whether your application is ready to serve traffic.

Health Check Packages

The built-in Microsoft.Extensions.Diagnostics.HealthChecks package provides the core framework. Community packages from Xabaril/AspNetCore.Diagnostics.HealthChecks add provider-specific checks:

Package Extension Method
AspNetCore.HealthChecks.Npgsql .AddNpgSql()
AspNetCore.HealthChecks.Redis .AddRedis()
AspNetCore.HealthChecks.Uris .AddUrlGroup()
AspNetCore.HealthChecks.UI.Client UIResponseWriter.WriteHealthCheckUIResponse

Basic Health Checks

builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
    .AddNpgSql(
        builder.Configuration.GetConnectionString("DefaultConnection")!,
        name: "database",
        tags: ["ready"])
    .AddRedis(
        builder.Configuration.GetConnectionString("Redis")!,
        name: "redis",
        tags: ["ready"])
    .AddUrlGroup(
        new Uri("https://api.external.com/health"),
        name: "external-api",
        tags: ["ready"]);

var app = builder.Build();

// Liveness: is the process running? (don't check dependencies)
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("live")
});

// Readiness: can the process serve traffic? (check dependencies)
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

Custom Health Checks

public sealed class DiskSpaceHealthCheck(
    IOptions<DiskSpaceOptions> options) : IHealthCheck
{
    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken ct = default)
    {
        var drive = new DriveInfo(options.Value.DrivePath);
        var freeSpaceMb = drive.AvailableFreeSpace / (1024 * 1024);

        var data = new Dictionary<string, object>
        {
            ["FreeSpaceMB"] = freeSpaceMb,
            ["DrivePath"] = options.Value.DrivePath
        };

        if (freeSpaceMb < options.Value.MinimumFreeSpaceMb)
        {
            return Task.FromResult(HealthCheckResult.Unhealthy(
                $"Low disk space: {freeSpaceMb}MB remaining", data: data));
        }

        return Task.FromResult(HealthCheckResult.Healthy(
            $"Disk space OK: {freeSpaceMb}MB free", data: data));
    }
}

// Registration
builder.Services.AddHealthChecks()
    .AddCheck<DiskSpaceHealthCheck>("disk-space", tags: ["ready"]);

Liveness vs Readiness

Check Purpose Failure Action Example
Liveness (/health/live) Is the process healthy? Restart container Self-check, deadlock detection
Readiness (/health/ready) Can the process serve traffic? Remove from load balancer Database, Redis, external APIs

Important: Liveness checks should NOT include dependency checks. If a database is down, restarting your app will not fix the database. Liveness checks that fail on dependency issues cause cascading restarts.

Health Check Publishing

HealthCheckPublisherOptions controls the periodic evaluation schedule. To push results to monitoring systems, register an IHealthCheckPublisher implementation:

builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy());

// Configure periodic evaluation schedule
builder.Services.Configure<HealthCheckPublisherOptions>(options =>
{
    options.Delay = TimeSpan.FromSeconds(5);   // Initial delay before first run
    options.Period = TimeSpan.FromSeconds(30);  // Interval between evaluations
});

// Register a publisher to push results (e.g., to logs, metrics, or external systems)
builder.Services.AddSingleton<IHealthCheckPublisher, LoggingHealthCheckPublisher>();

A minimal publisher that logs health status:

public sealed class LoggingHealthCheckPublisher(
    ILogger<LoggingHealthCheckPublisher> logger) : IHealthCheckPublisher
{
    public Task PublishAsync(
        HealthReport report, CancellationToken ct)
    {
        logger.LogInformation(
            "Health check: {Status} ({TotalDuration}ms)",
            report.Status,
            report.TotalDuration.TotalMilliseconds);
        return Task.CompletedTask;
    }
}

Putting It Together: Production Configuration

A complete observability setup for a production .NET API:

var builder = WebApplication.CreateBuilder(args);

// 1. OpenTelemetry -- traces, metrics, logs
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(builder.Environment.ApplicationName))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("MyApp.*")
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddMeter("MyApp.*")
        .AddOtlpExporter());

// 2. Structured logging with OpenTelemetry export
builder.Logging.AddOpenTelemetry(logging =>
{
    logging.IncludeScopes = true;
    logging.IncludeFormattedMessage = true;
    logging.AddOtlpExporter();
});

// 3. Health checks
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
    .AddNpgSql(
        builder.Configuration.GetConnectionString("DefaultConnection")!,
        name: "database",
        tags: ["ready"]);

// 4. Custom application metrics
builder.Services.AddSingleton<OrderMetrics>();

var app = builder.Build();

app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("live")
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});

app.Run();

Key Principles

  • Use OpenTelemetry as the standard -- it provides vendor-neutral instrumentation that works with any backend (Prometheus, Grafana, Datadog, Azure Monitor, AWS X-Ray)
  • Use IMeterFactory from DI -- do not create Meter instances directly; the factory integrates with the DI lifecycle and OpenTelemetry registration
  • Use source-generated LoggerMessage for hot paths -- zero allocation when the log level is disabled
  • Separate liveness from readiness -- liveness checks should not include dependency health; readiness checks should
  • Configure via environment variables -- OTLP endpoint, service name, and resource attributes should not be hardcoded
  • Enrich logs with trace context -- structured logging with TraceId and SpanId enables log-to-trace correlation
  • Follow OpenTelemetry semantic conventions for metric and span names

Agent Gotchas

  1. Do not create Meter or ActivitySource via new in DI-registered services without using IMeterFactory -- instruments created outside the factory are not collected by the OpenTelemetry SDK. Use IMeterFactory.Create() for Meter instances. ActivitySource is static and registered via .AddSource().
  2. Do not add dependency checks to liveness endpoints -- a database outage should not restart the app. Only the readiness endpoint should check dependencies.
  3. Do not use ILogger.LogInformation("message: " + value) or string interpolation $"message: {value}" -- use structured logging templates: ILogger.LogInformation("message: {Value}", value). String concatenation and interpolation bypass structured logging and prevent log indexing.
  4. Do not configure OTLP endpoints in code for production -- use environment variables (OTEL_EXPORTER_OTLP_ENDPOINT) so the same image works across environments.
  5. Do not forget to register custom ActivitySource names with .AddSource("MyApp.*") -- unregistered sources are silently ignored and produce no traces.

References


Attribution

Adapted from Aaronontheweb/dotnet-skills (MIT license).

Weekly Installs
5
GitHub Stars
1
First Seen
9 days ago
Installed on
gemini-cli5
github-copilot5
codex5
kimi-cli5
amp5
cline5