dotnet-performance-patterns
dotnet-performance-patterns
Performance-oriented architecture patterns for .NET applications. Covers zero-allocation coding with Span<T> and Memory<T>, buffer pooling with ArrayPool<T>, struct design for performance (readonly struct, ref struct, in parameters), sealed class devirtualization by the JIT, stack-based allocation with stackalloc, and string handling performance. Focuses on the why (performance rationale and measurement) rather than the how (language syntax).
Version assumptions: .NET 8.0+ baseline. Span<T> and Memory<T> are available from .NET Core 2.1+ but this skill targets modern usage patterns on .NET 8+.
Scope
- Zero-allocation coding with Span and Memory
- Buffer pooling with ArrayPool
- Struct design for performance (readonly struct, ref struct, in parameters)
- Sealed class devirtualization by the JIT
- Stack-based allocation with stackalloc
- String handling performance patterns
Out of scope
- C# language syntax for Span, records, pattern matching -- see [skill:dotnet-csharp-modern-patterns]
- Coding standards and naming conventions -- see [skill:dotnet-csharp-coding-standards]
- Microbenchmarking setup and measurement -- see [skill:dotnet-benchmarkdotnet]
- Native AOT compilation and trimming -- see [skill:dotnet-native-aot]
- Serialization format performance -- see [skill:dotnet-serialization]
- Architecture patterns (caching, resilience, DI) -- see [skill:dotnet-architecture-patterns]
Cross-references: [skill:dotnet-benchmarkdotnet] for measuring the impact of these patterns, [skill:dotnet-csharp-modern-patterns] for Span/Memory syntax foundation, [skill:dotnet-csharp-coding-standards] for sealed class style conventions, [skill:dotnet-native-aot] for AOT performance characteristics and trimming impact on pattern choices, [skill:dotnet-serialization] for serialization performance context.
Span<T> and Memory<T> for Zero-Allocation Scenarios
Why Span<T> Matters for Performance
Span<T> provides a safe, bounds-checked view over contiguous memory without allocating. It enables slicing arrays,
strings, and stack memory without copying. For syntax details see [skill:dotnet-csharp-modern-patterns]; this section
focuses on performance rationale.
Zero-Allocation String Processing
// BAD: Substring allocates a new string on each call
public static (string Key, string Value) ParseHeader_Allocating(string header)
{
var colonIndex = header.IndexOf(':');
return (header.Substring(0, colonIndex), header.Substring(colonIndex + 1).Trim());
}
// GOOD: ReadOnlySpan<char> slicing avoids all allocations
public static (ReadOnlySpan<char> Key, ReadOnlySpan<char> Value) ParseHeader_ZeroAlloc(
ReadOnlySpan<char> header)
{
var colonIndex = header.IndexOf(':');
return (header[..colonIndex], header[(colonIndex + 1)..].Trim());
}
```text
Performance impact: for high-throughput parsing (HTTP headers, log lines, CSV rows), Span-based parsing eliminates GC pressure entirely. Measure with `[MemoryDiagnoser]` in [skill:dotnet-benchmarkdotnet] -- the `Allocated` column should read `0 B`.
### Memory\<T\> for Async and Storage Scenarios
`Span<T>` cannot be used in async methods or stored on the heap (it is a ref struct). Use `Memory<T>` when you need to:
- Pass buffers to async I/O methods
- Store a slice reference in a field or collection
- Return a memory region from a method for later consumption
```csharp
public async Task<int> ReadAndProcessAsync(Stream stream, Memory<byte> buffer)
{
var bytesRead = await stream.ReadAsync(buffer);
var data = buffer[..bytesRead]; // Memory<T> slicing -- no allocation
return ProcessData(data.Span); // .Span for synchronous processing
}
private int ProcessData(ReadOnlySpan<byte> data)
{
var sum = 0;
foreach (var b in data)
sum += b;
return sum;
}
```text
---
## ArrayPool\<T\> for Buffer Pooling
### Why Pool Buffers
Large array allocations (>= 85,000 bytes) go directly to the Large Object Heap (LOH), which is only collected in Gen 2 GC -- expensive and causes pauses. Even smaller arrays add GC pressure in hot paths. `ArrayPool<T>` rents and returns buffers to avoid repeated allocations.
### Usage Pattern
```csharp
using System.Buffers;
public int ProcessLargeData(Stream source)
{
var buffer = ArrayPool<byte>.Shared.Rent(minimumLength: 81920);
try
{
var bytesRead = source.Read(buffer, 0, buffer.Length);
// IMPORTANT: Rent may return a larger buffer than requested.
// Always use bytesRead or the requested length, never buffer.Length.
return ProcessChunk(buffer.AsSpan(0, bytesRead));
}
finally
{
ArrayPool<byte>.Shared.Return(buffer, clearArray: true);
// clearArray: true zeroes the buffer -- use when buffer held sensitive data
}
}
```text
### Common Mistakes
| Mistake | Impact | Fix |
|---------|--------|-----|
| Using `buffer.Length` instead of requested size | Processes uninitialized bytes beyond actual data | Track requested/actual size separately |
| Forgetting to return the buffer | Pool exhaustion, falls back to allocation | Use try/finally or a `using` wrapper |
| Returning a buffer twice | Corrupts pool state | Null out the reference after return |
| Not clearing sensitive data | Security leak from pooled buffers | Pass `clearArray: true` to `Return` |
---
## readonly struct, ref struct, and in Parameters
### readonly struct -- Defensive Copy Elimination
The JIT must defensively copy non-readonly structs when accessed via `in`, `readonly` fields, or `readonly` methods to prevent mutation. Marking a struct `readonly` guarantees immutability, eliminating these copies:
```csharp
// GOOD: readonly eliminates defensive copies on every access
public readonly struct Point3D
{
public double X { get; }
public double Y { get; }
public double Z { get; }
public Point3D(double x, double y, double z) => (X, Y, Z) = (x, y, z);
// readonly struct: JIT knows this cannot mutate, no defensive copy needed
public double DistanceTo(in Point3D other)
{
var dx = X - other.X;
var dy = Y - other.Y;
var dz = Z - other.Z;
return Math.Sqrt(dx * dx + dy * dy + dz * dz);
}
}
```text
Without `readonly`, calling a method on a struct through an `in` parameter forces the JIT to copy the entire struct to protect against mutation. For large structs in tight loops, this eliminates significant overhead.
### ref struct -- Stack-Only Types
`ref struct` types are constrained to the stack. They cannot be boxed, stored in fields, or used in async methods. This enables safe wrapping of Span\<T\>:
```csharp
public ref struct SpanLineEnumerator
{
private ReadOnlySpan<char> _remaining;
public SpanLineEnumerator(ReadOnlySpan<char> text) => _remaining = text;
public ReadOnlySpan<char> Current { get; private set; }
public bool MoveNext()
{
if (_remaining.IsEmpty)
return false;
var newlineIndex = _remaining.IndexOf('\n');
if (newlineIndex == -1)
{
Current = _remaining;
_remaining = default;
}
else
{
Current = _remaining[..newlineIndex];
_remaining = _remaining[(newlineIndex + 1)..];
}
return true;
}
}
```text
### in Parameters -- Pass-by-Reference Without Mutation
Use `in` for large readonly structs passed to methods. The `in` modifier passes by reference (avoids copying) and prevents mutation:
```csharp
// in parameter: pass by reference, no copy, no mutation allowed
public static double CalculateDistance(in Point3D a, in Point3D b)
=> a.DistanceTo(in b);
```csharp
**When to use `in`:**
| Struct Size | Recommendation |
|-------------|---------------|
| <= 16 bytes | Pass by value (register-friendly, no indirection overhead) |
| > 16 bytes | Use `in` to avoid copy overhead |
| Any size, readonly struct | `in` is safe (no defensive copies) |
| Any size, non-readonly struct | Avoid `in` (defensive copies negate the benefit) |
---
## Sealed Class Performance Rationale
### JIT Devirtualization
When a class is `sealed`, the JIT can replace virtual method calls with direct calls (devirtualization) because no subclass override is possible. This enables further inlining:
```csharp
// Without sealed: virtual dispatch through vtable
public class OpenService : IProcessor
{
public virtual int Process(int x) => x * 2;
}
// With sealed: JIT devirtualizes + inlines Process call
public sealed class SealedService : IProcessor
{
public int Process(int x) => x * 2;
}
public interface IProcessor { int Process(int x); }
```text
Verify devirtualization with `[DisassemblyDiagnoser]` in [skill:dotnet-benchmarkdotnet]. See [skill:dotnet-csharp-coding-standards] for the project convention of defaulting to sealed classes.
### Performance Impact
Devirtualization + inlining eliminates:
1. **vtable lookup** -- indirect memory access to find the method pointer
2. **Call overhead** -- the actual indirect call instruction
3. **Inlining barrier** -- virtual calls cannot be inlined; sealed methods can
In tight loops and hot paths, the cumulative effect is measurable. For framework/library types that are not designed for extension, always prefer `sealed`.
---
## stackalloc for Small Stack-Based Allocations
### When to Use stackalloc
`stackalloc` allocates memory on the stack, avoiding GC entirely. Use for small, fixed-size buffers in hot paths:
```csharp
public static string FormatGuid(Guid guid)
{
// 68 bytes on the stack -- well within safe limits
Span<char> buffer = stackalloc char[68];
guid.TryFormat(buffer, out var charsWritten, "D");
return new string(buffer[..charsWritten]);
}
```text
### Safety Guidelines
| Guideline | Rationale |
|-----------|-----------|
| Keep allocations small (< 1 KB typical, < 4 KB absolute maximum) | Stack space is limited (~1 MB default on Windows); overflow crashes the process |
| Use constant or bounded sizes only | Runtime-variable sizes risk stack overflow from malicious/unexpected input |
| Prefer `Span<T>` assignment over raw pointer | Span provides bounds checking; raw pointers do not |
| Fall back to ArrayPool for large/variable sizes | Gracefully handle cases that exceed stack budget |
### Hybrid Pattern: stackalloc with ArrayPool Fallback
```csharp
public static string ProcessData(ReadOnlySpan<byte> input)
{
const int stackThreshold = 256;
char[]? rented = null;
Span<char> buffer = input.Length <= stackThreshold
? stackalloc char[stackThreshold]
: (rented = ArrayPool<char>.Shared.Rent(input.Length));
try
{
var written = Encoding.UTF8.GetChars(input, buffer);
return new string(buffer[..written]);
}
finally
{
if (rented is not null)
ArrayPool<char>.Shared.Return(rented);
}
}
```text
This pattern is used throughout the .NET runtime libraries and is the recommended approach for methods that handle both small and large inputs.
---
## String Interning and StringComparison Performance
### String Comparison Performance
Ordinal comparisons are significantly faster than culture-aware comparisons because they avoid Unicode normalization:
```csharp
// FAST: ordinal comparison (byte-by-byte)
bool isMatch = str.Equals("expected", StringComparison.Ordinal);
bool containsKey = dict.ContainsKey(key); // Dictionary<string, T> uses ordinal by default
// FAST: case-insensitive ordinal (no culture overhead)
bool isMatchIgnoreCase = str.Equals("expected", StringComparison.OrdinalIgnoreCase);
// SLOW: culture-aware comparison (Unicode normalization, linguistic rules)
bool isMatchCulture = str.Equals("expected", StringComparison.CurrentCulture);
```text
**Default guidance:** Use `StringComparison.Ordinal` or `StringComparison.OrdinalIgnoreCase` for internal identifiers, dictionary keys, file paths, and protocol strings. Reserve culture-aware comparison for user-visible text sorting and display.
### String Interning
The CLR interns compile-time string literals automatically. `string.Intern()` can reduce memory for runtime strings that repeat frequently:
```csharp
// Intern frequently-repeated runtime strings to share a single instance
var normalized = string.Intern(headerName.ToLowerInvariant());
```csharp
**Caution:** Interned strings are never garbage collected. Only intern strings from a bounded, known set (HTTP headers, XML element names). Never intern user input or unbounded data.
### Efficient String Building
| Scenario | Recommended Approach | Why |
|----------|---------------------|-----|
| 2-3 concatenations | String interpolation `$"{a}{b}"` | Compiler optimizes to `string.Concat` |
| Loop concatenation | `StringBuilder` | Avoids quadratic allocation |
| Known fixed parts | `string.Create` | Single allocation, Span-based writing |
| High-throughput formatting | `Span<char>` + `TryFormat` | Zero-allocation formatting |
```csharp
// string.Create for single-allocation building
public static string FormatId(int category, int item)
{
return string.Create(11, (category, item), static (span, state) =>
{
state.category.TryFormat(span, out var catWritten);
span[catWritten] = '-';
state.item.TryFormat(span[(catWritten + 1)..], out _);
});
}
```text
---
## Performance Measurement Checklist
Before applying any optimization pattern, measure first. Premature optimization without data leads to complex code with no measurable benefit.
1. **Identify the hot path** -- use [skill:dotnet-benchmarkdotnet] to establish a baseline
2. **Measure allocations** -- enable `[MemoryDiagnoser]` and check the `Allocated` column
3. **Apply one pattern at a time** -- change one thing, re-measure, compare to baseline
4. **Check AOT impact** -- if targeting Native AOT ([skill:dotnet-native-aot]), verify patterns are trim-safe
5. **Verify with production-like data** -- synthetic benchmarks can miss real-world allocation patterns
6. **Document the tradeoff** -- every optimization trades readability or flexibility for speed; record the measured gain
---
## Agent Gotchas
1. **Measure before optimizing** -- never apply Span/ArrayPool/stackalloc without a benchmark showing the allocation or latency problem. Premature optimization produces unreadable code for no measurable gain.
2. **Do not use stackalloc with variable sizes from untrusted input** -- stack overflow crashes the process with no exception handler. Always validate bounds or use the hybrid stackalloc/ArrayPool pattern.
3. **Always mark value types `readonly struct` when they are immutable** -- without `readonly`, the JIT generates defensive copies on every `in` parameter access and `readonly` field access, silently negating the performance benefit of using structs.
4. **Return rented ArrayPool buffers in finally blocks** -- forgetting to return starves the pool and causes fallback allocations that negate the benefit.
5. **Use `StringComparison.Ordinal` for internal comparisons** -- omitting the comparison parameter defaults to culture-aware comparison, which is slower and produces surprising results for technical strings (file paths, identifiers).
6. **Sealed classes help performance only when the JIT can see the concrete type** -- if the object is accessed through an interface variable in a non-devirtualizable call site, sealing provides no benefit. Verify with `[DisassemblyDiagnoser]`.
7. **Do not re-teach language syntax** -- reference [skill:dotnet-csharp-modern-patterns] for Span/Memory syntax details. This skill focuses on when and why to use these patterns for performance.
---
## Knowledge Sources
Performance patterns in this skill are grounded in guidance from:
- **Stephen Toub** -- .NET Performance blog series ([devblogs.microsoft.com/dotnet/author/toub](https://devblogs.microsoft.com/dotnet/author/toub/)). Authoritative source on Span\<T\>, ValueTask, ArrayPool, async internals, and runtime performance characteristics.
- **Stephen Cleary** -- Async best practices and concurrent collections guidance. Author of *Concurrency in C# Cookbook*.
- **Nick Chapsas** -- Modern .NET performance patterns and benchmarking methodology.
> These sources inform the patterns and rationale presented above. This skill does not claim to represent or speak for any individual.