axiom-swift-performance
Swift Performance Optimization
Purpose
Core Principle: Optimize Swift code by understanding language-level performance characteristics—value semantics, ARC behavior, generic specialization, and memory layout—to write fast, efficient code without premature micro-optimization.
Swift Version: Swift 6.2+ (for InlineArray, Span, @concurrent)
Xcode: 16+
Platforms: iOS 18+, macOS 15+
Related Skills:
axiom-performance-profiling— Use Instruments to measure (do this first!)axiom-swiftui-performance— SwiftUI-specific optimizationsaxiom-build-performance— Compilation speedaxiom-swift-concurrency— Correctness-focused concurrency patterns
When to Use This Skill
✅ Use this skill when
- App profiling shows Swift code as the bottleneck (Time Profiler hotspots)
- Excessive memory allocations or retain/release traffic
- Implementing performance-critical algorithms or data structures
- Writing framework or library code with performance requirements
- Optimizing tight loops or frequently called methods
- Dealing with large data structures or collections
- Code review identifying performance anti-patterns
Quick Decision Tree
Performance issue identified?
│
├─ Profiler shows excessive copying?
│ └─ → Part 1: Noncopyable Types
│ └─ → Part 2: Copy-on-Write
│
├─ Retain/release overhead in Time Profiler?
│ └─ → Part 4: ARC Optimization
│
├─ Generic code in hot path?
│ └─ → Part 5: Generics & Specialization
│
├─ Collection operations slow?
│ └─ → Part 7: Collection Performance
│
├─ Async/await overhead visible?
│ └─ → Part 8: Concurrency Performance
│
├─ Struct vs class decision?
│ └─ → Part 3: Value vs Reference
│
└─ Memory layout concerns?
└─ → Part 9: Memory Layout
The Four Principles of Swift Performance
From WWDC 2024-10217: Swift's low-level performance characteristics come down to four areas. Each maps to a Part in this skill.
| Principle | What It Costs | Skill Coverage |
|---|---|---|
| Function Calls | Dispatch overhead, optimization barriers | Part 5 (Generics), Part 6 (Inlining) |
| Memory Allocation | Stack vs heap, allocation frequency | Part 3 (Value vs Reference), Part 7 (Collections) |
| Memory Layout | Cache locality, padding, contiguity | Part 9 (Memory Layout), Part 11 (Span) |
| Value Copying | COW triggers, defensive copies, ARC traffic | Part 1 (Noncopyable), Part 2 (COW), Part 4 (ARC) |
Understanding which principle is causing your bottleneck determines which Part to use.
Part 1: Noncopyable Types (~Copyable)
Swift 6.0+ introduces noncopyable types for performance-critical scenarios where you want to avoid implicit copies.
When to Use
- Large types that should never be copied (file handles, GPU buffers)
- Types with ownership semantics (must be explicitly consumed)
- Performance-critical code where copies are expensive
Basic Pattern
// Noncopyable type
struct FileHandle: ~Copyable {
private let fd: Int32
init(path: String) throws {
self.fd = open(path, O_RDONLY)
guard fd != -1 else { throw FileError.openFailed }
}
deinit {
close(fd)
}
// Must explicitly consume
consuming func close() {
_ = consume self
}
}
// Usage
func processFile() throws {
let handle = try FileHandle(path: "/data.txt")
// handle is automatically consumed at end of scope
// Cannot accidentally copy handle
}
Ownership Annotations
// consuming - takes ownership, caller cannot use after
func process(consuming data: [UInt8]) {
// data is consumed
}
// borrowing - temporary access without ownership
func validate(borrowing data: [UInt8]) -> Bool {
// data can still be used by caller
return data.count > 0
}
// inout - mutable access
func modify(inout data: [UInt8]) {
data.append(0)
}
Performance Impact
- Eliminates implicit copies: Compiler error instead of runtime copy
- Zero-cost abstraction: Same performance as manual memory management
- Use when: Type is expensive to copy (>64 bytes) and copies are rare
Part 2: Copy-on-Write (COW)
Swift collections use COW for efficient memory sharing. Understanding when copies happen is critical for performance.
How COW Works
var array1 = [1, 2, 3] // Single allocation
var array2 = array1 // Share storage (no copy)
array2.append(4) // Now copies (array1 modified array2)
For custom COW implementation, see Copy-Paste Pattern 1 (COW Wrapper) below.
Performance Tips
// ❌ Accidental copy in loop
for i in 0..<array.count {
array[i] = transform(array[i]) // Copy on first mutation if shared!
}
// ✅ Reserve capacity first (ensures unique)
array.reserveCapacity(array.count)
for i in 0..<array.count {
array[i] = transform(array[i])
}
// ❌ Multiple mutations trigger multiple uniqueness checks
array.append(1)
array.append(2)
array.append(3)
// ✅ Single reservation
array.reserveCapacity(array.count + 3)
array.append(contentsOf: [1, 2, 3])
Defensive Copies
From WWDC 2024-10217: Swift sometimes inserts defensive copies when it cannot prove a value won't be mutated through a shared reference.
class DataStore {
var items: [Item] = [] // COW type stored in class
}
func process(_ store: DataStore) {
for item in store.items {
// Swift may defensively copy `items` because:
// 1. store.items is a class property (another reference could mutate it)
// 2. The loop needs a stable snapshot
handle(item)
}
}
How to avoid: Copy to a local variable first — one explicit copy instead of repeated defensive copies:
func process(_ store: DataStore) {
let items = store.items // One copy
for item in items {
handle(item) // No more defensive copies
}
}
In profiler: Defensive copies appear as unexpected swift_retain/swift_release pairs or Array.__allocating_init calls when you didn't expect allocation.
Part 3: Value vs Reference Semantics
Choosing between struct and class has significant performance implications.
Decision Matrix
| Factor | Use Struct | Use Class |
|---|---|---|
| Size | ≤ 64 bytes | > 64 bytes or contains large data |
| Identity | No identity needed | Needs identity (===) |
| Inheritance | Not needed | Inheritance required |
| Mutation | Infrequent | Frequent in-place updates |
| Sharing | No sharing needed | Must be shared across scope |
Small Structs (Fast)
// ✅ Fast - fits in registers, no heap allocation
struct Point {
var x: Double // 8 bytes
var y: Double // 8 bytes
} // Total: 16 bytes - excellent for struct
struct Color {
var r, g, b, a: UInt8 // 4 bytes total - perfect for struct
}
Large Structs (Slow)
// ❌ Slow - excessive copying
struct HugeData {
var buffer: [UInt8] // 1MB
var metadata: String
}
func process(_ data: HugeData) { // Copies 1MB!
// ...
}
// ✅ Use reference semantics for large data
final class HugeData {
var buffer: [UInt8]
var metadata: String
}
func process(_ data: HugeData) { // Only copies pointer (8 bytes)
// ...
}
Indirect Storage for Flexibility
For large data that needs value semantics externally with reference storage internally, use the COW Wrapper pattern — see Copy-Paste Pattern 1 below.
Part 4: ARC Optimization
Automatic Reference Counting adds overhead. Minimize it where possible.
Weak vs Unowned Performance
class Parent {
var child: Child?
}
class Child {
// ❌ Weak adds overhead (optional, thread-safe zeroing)
weak var parent: Parent?
}
// ✅ Unowned when you know lifetime guarantees
class Child {
unowned let parent: Parent // No overhead, crashes if parent deallocated
}
Performance: unowned is ~2x faster than weak (no atomic operations).
Use when: Child lifetime < Parent lifetime (guaranteed).
Closure Capture Optimization
class DataProcessor {
var data: [Int]
// ❌ Captures self strongly, then uses weak - unnecessary weak overhead
func process(completion: @escaping () -> Void) {
DispatchQueue.global().async { [weak self] in
guard let self else { return }
self.data.forEach { print($0) }
completion()
}
}
// ✅ Capture only what you need
func process(completion: @escaping () -> Void) {
let data = self.data // Copy value type
DispatchQueue.global().async {
data.forEach { print($0) } // No self captured
completion()
}
}
}
Closure Capture Costs
From WWDC 2024-10217: Closures have different performance profiles depending on whether they escape.
// Non-escaping closure — stack-allocated context, zero ARC overhead
func processItems(_ items: [Item], using transform: (Item) -> Result) -> [Result] {
items.map(transform) // Closure context lives on stack
}
// Escaping closure — heap-allocated context, ARC on every captured reference
func processItemsLater(_ items: [Item], transform: @escaping (Item) -> Result) {
// Closure context heap-allocated as anonymous class instance
// Each captured reference gets retain/release
self.pending = { items.map(transform) }
}
Why this matters: @Sendable closures are always escaping, meaning every Task closure heap-allocates its capture context.
In hot paths: Prefer non-escaping closures. If you see swift_allocObject in Time Profiler for closure contexts, look for escaping closures that could be non-escaping.
Observable Object Lifetimes
From WWDC 2021-10216: Object lifetimes end at last use, not at closing brace.
// ❌ Relying on observed lifetime is fragile
class Traveler {
weak var account: Account?
deinit {
print("Deinitialized") // May run BEFORE expected with ARC optimizations!
}
}
func test() {
let traveler = Traveler()
let account = Account(traveler: traveler)
// traveler's last use is above - may deallocate here!
account.printSummary() // weak reference may be nil!
}
// ✅ Explicitly extend lifetime when needed
func test() {
let traveler = Traveler()
let account = Account(traveler: traveler)
withExtendedLifetime(traveler) {
account.printSummary() // traveler guaranteed to live
}
}
Object lifetimes can change between Xcode versions, Debug vs Release, and unrelated code changes. Enable "Optimize Object Lifetimes" (Xcode 13+) during development to expose hidden lifetime bugs early.
Part 5: Generics & Specialization
Generic code can be fast or slow depending on specialization.
Specialization Basics
// Generic function
func process<T>(_ value: T) {
print(value)
}
// Calling with concrete type
process(42) // Compiler specializes: process_Int(42)
process("hello") // Compiler specializes: process_String("hello")
Existential Overhead
protocol Drawable {
func draw()
}
// ❌ Existential container - expensive (heap allocation, indirection)
func drawAll(shapes: [any Drawable]) {
for shape in shapes {
shape.draw() // Dynamic dispatch through witness table
}
}
// ✅ Generic with constraint - can specialize
func drawAll<T: Drawable>(shapes: [T]) {
for shape in shapes {
shape.draw() // Static dispatch after specialization
}
}
Performance: Generic version ~10x faster (eliminates witness table overhead).
Existential Container Overhead
From WWDC 2016-416: any Protocol uses a 40-byte existential container (5 words on 64-bit). The container stores type metadata + protocol witness table (16 bytes) plus a 24-byte inline value buffer. Types ≤24 bytes are stored directly in the buffer (fast, ~5ns access); larger types require a heap allocation with pointer indirection (slower, ~15ns). some Protocol eliminates all container overhead (~2ns).
When some isn't available (heterogeneous collections require any):
- Reduce type sizes to ≤24 bytes — keep protocol-conforming types small enough for inline storage (3 words: e.g.,
Point { x, y, z: Double }fits exactly) - Use enum dispatch instead — eliminates containers entirely, trades open extensibility for performance:
// ❌ Existential: 40 bytes/element, witness table dispatch
let shapes: [any Drawable] = [circle, rect]
// ✅ Enum: value-sized, static dispatch via switch
enum Shape { case circle(Circle), rect(Rect) }
func draw(_ shape: Shape) {
switch shape {
case .circle(let c): c.draw()
case .rect(let r): r.draw()
}
}
- Batch operations — amortize per-element existential overhead by processing in chunks rather than one-at-a-time
- Measure first — existential overhead (~10ns/access) only matters in tight loops; for UI-level code it's negligible
@_specialize Attribute
Force specialization for common types when the compiler doesn't do it automatically:
@_specialize(where T == Int)
@_specialize(where T == String)
func process<T: Comparable>(_ value: T) -> T { value }
// Generates specialized versions + generic fallback
Part 6: Inlining
Inlining eliminates function call overhead but increases code size.
When to Inline
// ✅ Small, frequently called functions
@inlinable
public func fastAdd(_ a: Int, _ b: Int) -> Int {
return a + b
}
// ❌ Large functions - code bloat
@inlinable // Don't do this!
public func complexAlgorithm() {
// 100 lines of code...
}
Cross-Module Optimization
// Framework code
public struct Point {
public var x: Double
public var y: Double
// ✅ Inlinable for cross-module optimization
@inlinable
public func distance(to other: Point) -> Double {
let dx = x - other.x
let dy = y - other.y
return sqrt(dx*dx + dy*dy)
}
}
// Client code
let p1 = Point(x: 0, y: 0)
let p2 = Point(x: 3, y: 4)
let d = p1.distance(to: p2) // Inlined across module boundary
@usableFromInline
// Internal helper that can be inlined
@usableFromInline
internal func helperFunction() { }
// Public API that uses it
@inlinable
public func publicAPI() {
helperFunction() // Can inline internal function
}
Trade-off: @inlinable exposes implementation, prevents future optimization.
Part 7: Collection Performance
Choosing the right collection and using it correctly matters.
Array vs ContiguousArray
// ❌ Array<T> - may use NSArray bridging (Swift/ObjC interop)
let array: Array<Int> = [1, 2, 3]
// ✅ ContiguousArray<T> - guaranteed contiguous memory (no bridging)
let array: ContiguousArray<Int> = [1, 2, 3]
Use ContiguousArray when: No ObjC bridging needed (pure Swift), ~15% faster.
Reserve Capacity
// ❌ Multiple reallocations
var array: [Int] = []
for i in 0..<10000 {
array.append(i) // Reallocates ~14 times
}
// ✅ Single allocation
var array: [Int] = []
array.reserveCapacity(10000)
for i in 0..<10000 {
array.append(i) // No reallocations
}
Dictionary Hashing
struct BadKey: Hashable {
var data: [Int]
// ❌ Expensive hash (iterates entire array)
func hash(into hasher: inout Hasher) {
for element in data {
hasher.combine(element)
}
}
}
struct GoodKey: Hashable {
var id: UUID // Fast hash
var data: [Int] // Not hashed
// ✅ Hash only the unique identifier
func hash(into hasher: inout Hasher) {
hasher.combine(id)
}
}
InlineArray (Swift 6.2)
Fixed-size arrays stored directly on the stack—no heap allocation, no COW overhead. Uses value generics to encode size in the type.
// Traditional Array - heap allocated, COW overhead
var sprites: [Sprite] = Array(repeating: .default, count: 40)
// InlineArray - stack allocated, no COW (value generic syntax)
var sprites = InlineArray<40, Sprite>(repeating: .default)
Conformances: RandomAccessCollection, MutableCollection, BitwiseCopyable, Sendable. Supports ~Copyable element types.
When to Use InlineArray:
- Fixed size known at compile time
- Performance-critical paths (tight loops, hot paths)
- Want to avoid heap allocation entirely
- Small to medium sizes (practical limit ~1KB stack usage)
InlineArray is stack-allocated (no heap), eagerly copied (not COW), and provides .span/.mutableSpan for zero-copy access. Measure your own benchmarks for allocation/copy/mutation trade-offs vs Array.
Copy Semantics Warning:
// ❌ Unexpected: InlineArray copies eagerly
func processLarge(_ data: InlineArray<1000, UInt8>) {
// Copies all 1000 bytes on call!
}
// ✅ Use Span to avoid copy
func processLarge(_ data: Span<UInt8>) {
// Zero-copy view, no matter the size
}
// Best practice: Store InlineArray, pass Span
struct Buffer {
var storage = InlineArray<1000, UInt8>(repeating: 0)
func process() {
helper(storage.span) // Pass view, not copy
}
}
When NOT to Use InlineArray:
- Dynamic sizes (use Array)
- Large data (>1KB stack usage risky)
- Frequently passed by value (use Span instead)
- Need COW semantics (use Array)
Lazy Sequences
// ❌ Eager evaluation - processes entire array
let result = array
.map { expensive($0) }
.filter { $0 > 0 }
.first // Only need first element!
// ✅ Lazy evaluation - stops at first match
let result = array
.lazy
.map { expensive($0) }
.filter { $0 > 0 }
.first // Only evaluates until first match
Part 8: Concurrency Performance
Async/await and actors add overhead. Use appropriately.
Actor Isolation Overhead
actor Counter {
private var value = 0
// ❌ Actor call overhead for simple operation
func increment() {
value += 1
}
}
// Calling from different isolation domain
for _ in 0..<10000 {
await counter.increment() // 10,000 actor hops!
}
// ✅ Batch operations to reduce actor overhead
actor Counter {
private var value = 0
func incrementBatch(_ count: Int) {
value += count
}
}
await counter.incrementBatch(10000) // Single actor hop
Async Overhead
Each async suspension costs ~20-30μs. Keep synchronous operations synchronous—don't mark a function async if it doesn't need to await.
Task Creation Cost
// ❌ Creating task per item (~100μs overhead each)
for item in items {
Task {
await process(item)
}
}
// ✅ Single task for batch
Task {
for item in items {
await process(item)
}
}
// ✅ Or use TaskGroup for parallelism
await withTaskGroup(of: Void.self) { group in
for item in items {
group.addTask {
await process(item)
}
}
}
@concurrent Attribute (Swift 6.2)
// Force background execution
@concurrent
func expensiveComputation() -> Int {
// Always runs on background thread, even if called from MainActor
return complexCalculation()
}
// Safe to call from main actor without blocking
@MainActor
func updateUI() async {
let result = await expensiveComputation() // Guaranteed off main thread
label.text = "\(result)"
}
For nonisolated performance patterns and detailed actor isolation guidance, see axiom-swift-concurrency.
Part 9: Memory Layout
Understanding memory layout helps optimize cache performance and reduce allocations.
Struct Padding
// ❌ Poor layout (24 bytes due to padding)
struct BadLayout {
var a: Bool // 1 byte + 7 padding
var b: Int64 // 8 bytes
var c: Bool // 1 byte + 7 padding
}
print(MemoryLayout<BadLayout>.size) // 24 bytes
// ✅ Optimized layout (16 bytes)
struct GoodLayout {
var b: Int64 // 8 bytes
var a: Bool // 1 byte
var c: Bool // 1 byte + 6 padding
}
print(MemoryLayout<GoodLayout>.size) // 16 bytes
Alignment
// Query alignment
print(MemoryLayout<Double>.alignment) // 8
print(MemoryLayout<Int32>.alignment) // 4
// Structs align to largest member
struct Mixed {
var int32: Int32 // 4 bytes, 4-byte aligned
var double: Double // 8 bytes, 8-byte aligned
}
print(MemoryLayout<Mixed>.alignment) // 8 (largest member)
Cache-Friendly Data Structures
// ❌ Poor cache locality
struct PointerBased {
var next: UnsafeMutablePointer<Node>? // Pointer chasing
}
// ✅ Array-based for cache locality
struct ArrayBased {
var data: ContiguousArray<Int> // Contiguous memory
}
// Array iteration ~10x faster due to cache prefetching
Exclusivity Checks
From WWDC 2025-312: Runtime exclusivity enforcement (swift_beginAccess/swift_endAccess) appears in Time Profiler when the compiler cannot prove memory safety statically.
What they are: Swift enforces that no two accesses to the same variable overlap if one is a write. For struct properties, this is checked at compile time. For class stored properties, runtime checks are inserted.
How to identify: Look for swift_beginAccess and swift_endAccess in Time Profiler or Processor Trace flame graphs.
// ❌ Class properties require runtime exclusivity checks
class Parser {
var state: ParserState
var cache: [Int: Pixel]
func parse() {
state.advance() // swift_beginAccess / swift_endAccess
cache[key] = pixel // swift_beginAccess / swift_endAccess
}
}
// ✅ Struct properties checked at compile time — zero runtime cost
struct Parser {
var state: ParserState
var cache: InlineArray<64, Pixel>
mutating func parse() {
state.advance() // No runtime check
cache[key] = pixel // No runtime check
}
}
Real-world impact: In WWDC 2025-312's QOI image parser, moving properties from a class to a struct eliminated all runtime exclusivity checks, contributing to a measurable speedup as part of a >700x total improvement.
Part 10: Typed Throws (Swift 6)
Typed throws can be faster than untyped by avoiding existential overhead.
Untyped vs Typed
// Untyped - existential container for error
func fetchData() throws -> Data {
// Can throw any Error
throw NetworkError.timeout
}
// Typed - concrete error type
func fetchData() throws(NetworkError) -> Data {
// Can only throw NetworkError
throw NetworkError.timeout
}
Performance Impact
// Measure with tight loop
func untypedThrows() throws -> Int {
throw GenericError.failed
}
func typedThrows() throws(GenericError) -> Int {
throw GenericError.failed
}
// Benchmark: typed ~5-10% faster (no existential overhead)
When to Use
- Typed: Library code with well-defined error types, hot paths
- Untyped: Application code, error types unknown at compile time
Part 11: Span Types
Swift 6.2+ introduces Span—a non-escapable, non-owning view into memory that provides safe, efficient access to contiguous data.
What is Span?
Span is a modern replacement for UnsafeBufferPointer that provides:
- Spatial safety: Bounds-checked operations prevent out-of-bounds access
- Temporal safety: Lifetime inherited from source, preventing use-after-free
- Zero overhead: No heap allocation, no reference counting
- Non-escapable: Cannot outlive the data it references
// Traditional unsafe approach
func processUnsafe(_ data: UnsafeMutableBufferPointer<UInt8>) {
data[100] = 0 // Crashes if out of bounds!
}
// Safe Span approach
func processSafe(_ data: MutableSpan<UInt8>) {
data[100] = 0 // Traps with clear error if out of bounds
}
When to Use Span vs Array vs UnsafeBufferPointer
| Use Case | Recommendation |
|---|---|
| Own the data | Array (full ownership, COW) |
| Temporary view for reading | Span (safe, fast) |
| Temporary view for writing | MutableSpan (safe, fast) |
| C interop, performance-critical | RawSpan (untyped bytes) |
| Unsafe performance | UnsafeBufferPointer (legacy, avoid) |
Basic Span Usage
let array = [1, 2, 3, 4, 5]
let span = array.span // Read-only view
print(span[0]) // Subscript access
for element in span { } // Safe iteration
let slice = span[1..<3] // Span slice, no copy
MutableSpan for Modifications
var array = [10, 20, 30, 40, 50]
let mutableSpan = array.mutableSpan
mutableSpan[0] = 100 // Modifies array in-place, bounds-checked
RawSpan for Untyped Bytes
func parsePacket(_ data: RawSpan) -> PacketHeader? {
guard data.count >= MemoryLayout<PacketHeader>.size else { return nil }
// Safe byte-level access via subscript
return PacketHeader(version: data[0], flags: data[1],
length: UInt16(data[3]) << 8 | UInt16(data[2]))
}
let header = parsePacket(bytes.rawSpan) // .rawSpan on any [UInt8]
All Swift 6.2 collections provide .span and .mutableSpan properties, including Array, ContiguousArray, and UnsafeBufferPointer (migration path). Span access speed matches UnsafeBufferPointer (~2ns) with bounds checking.
Non-Escapable Lifetime Safety
Span's lifetime is bound to its source. The compiler prevents returning a Span from a function where the source would be deallocated — unlike UnsafeBufferPointer, which allows this bug silently.
func dangerousSpan() -> Span<Int> {
let array = [1, 2, 3]
return array.span // ❌ Error: Cannot return non-escapable value
}
InlineArray also provides .span/.mutableSpan — see Part 7 for InlineArray usage and copy-avoidance via Span.
Migration from UnsafeBufferPointer
// ❌ Old: unsafe, no bounds checking
func parseLegacy(_ buffer: UnsafeBufferPointer<UInt8>) -> Header {
Header(magic: buffer[0], version: buffer[1]) // Silent OOB crash
}
// ✅ New: safe, bounds-checked, same performance
func parseModern(_ span: Span<UInt8>) -> Header {
Header(magic: span[0], version: span[1]) // Traps on OOB
}
// Bridge: existing UnsafeBufferPointer → Span
let span = buffer.span // Wrap unsafe in safe span
parseModern(span)
OutputSpan — Safe Initialization
OutputSpan/OutputRawSpan replace UnsafeMutableBufferPointer for initializing new collections without intermediate allocations.
// Binary serialization: write header bytes safely
@lifetime(&output)
func writeHeader(to output: inout OutputRawSpan) {
output.append(0x01) // version
output.append(0x00) // flags
output.append(UInt16(42)) // length (type-safe)
}
Use for building byte arrays, binary serialization, image pixel data. Apple's open-source Swift Binary Parsing library is built entirely on Span types.
When NOT to Use Span
- Ownership: Span can't be stored in structs/classes — use Array for owned data, provide
.spanaccess via computed property - Return values: Span is non-escapable — process in scope, return owned data
- Long-lived references: Span lifetime is bound to source — use Array if data must outlive the current scope
Copy-Paste Patterns
Pattern 1: COW Wrapper
final class Storage<T> {
var value: T
init(_ value: T) { self.value = value }
}
struct COWWrapper<T> {
private var storage: Storage<T>
init(_ value: T) {
storage = Storage(value)
}
var value: T {
get { storage.value }
set {
if !isKnownUniquelyReferenced(&storage) {
storage = Storage(newValue)
} else {
storage.value = newValue
}
}
}
}
Pattern 2: Performance-Critical Loop
func processLargeArray(_ input: [Int]) -> [Int] {
var result = ContiguousArray<Int>()
result.reserveCapacity(input.count)
for element in input {
result.append(transform(element))
}
return Array(result)
}
Pattern 3: Inline Cache Lookup
private var cache: [Key: Value] = [:]
@inlinable
func getCached(_ key: Key) -> Value? {
return cache[key] // Inlined across modules
}
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Premature optimization | Complex COW/ContiguousArray with no profiling data | Start simple, profile, optimize what matters |
| Weak everywhere | weak on every delegate (atomic overhead) |
Use unowned when lifetime is guaranteed (see Part 4) |
| Actor for everything | Actor isolation on simple counters (~100μs/call) | Use lock-free atomics (ManagedAtomic) for simple sync data |
Code Review Checklist
Memory Management
- Large structs (>64 bytes) use indirect storage or are classes
- COW types use
isKnownUniquelyReferencedbefore mutation - Collections use
reserveCapacitywhen size is known - Weak references only where needed (prefer unowned when safe)
Generics
- Protocol types use
someinstead ofanywhere possible - Hot paths use concrete types or
@_specialize - Generic constraints are as specific as possible
Collections
- Pure Swift code uses
ContiguousArrayoverArray - Dictionary keys have efficient
hash(into:)implementations - Lazy evaluation used for short-circuit operations
Concurrency
- Synchronous operations don't use
async - Actor calls are batched when possible
- Task creation is minimized (use TaskGroup)
- CPU-intensive work uses
@concurrent(Swift 6.2)
Optimization
- Profiling data exists before optimization
- Inlining only for small, frequently called functions
- Memory layout optimized for cache locality (large structs)
Pressure Scenarios
Scenario 1: "Just make it faster, we ship tomorrow"
The Pressure: Manager sees "slow" in profiler, demands immediate action.
Red Flags:
- No baseline measurements
- No Time Profiler data showing hotspots
- "Make everything faster" without targets
Time Cost Comparison:
- Premature optimization: 2 days of work, no measurable improvement
- Profile-guided optimization: 2 hours profiling + 4 hours fixing actual bottleneck = 40% faster
How to Push Back Professionally:
"I want to optimize effectively. Let me spend 30 minutes with Instruments
to find the actual bottleneck. This prevents wasting time on code that's
not the problem. I've seen this save days of work."
Scenario 2: "Use actors everywhere for thread safety"
The Pressure: Team adopts Swift 6, decides "everything should be an actor."
Red Flags:
- Actor for simple value types
- Actor for synchronous-only operations
- Async overhead in tight loops
Time Cost Comparison:
- Actor everywhere: 100μs overhead per operation, janky UI
- Appropriate isolation: 10μs overhead, smooth 60fps
How to Push Back Professionally:
"Actors are great for isolation, but they add overhead. For this simple
counter, lock-free atomics are 10x faster. Let's use actors where we need
them—shared mutable state—and avoid them for pure value types."
Scenario 3: "Inline everything for speed"
The Pressure: Someone reads that inlining is faster, marks everything @inlinable.
Red Flags:
- Large functions marked
@inlinable - Internal implementation details exposed
- Binary size increases 50%
Time Cost Comparison:
- Inline everything: Code bloat, slower app launch (3s → 5s)
- Selective inlining: Fast launch, actual hotspots optimized
How to Push Back Professionally:
"Inlining trades code size for speed. The compiler already inlines when
beneficial. Manual @inlinable should be for small, frequently called
functions. Let's profile and inline the 3 actual hotspots, not everything."
Real-World Examples
Example 1: Image Processing Pipeline
Problem: Processing 1000 images takes 30 seconds.
Investigation:
// Original code
func processImages(_ images: [UIImage]) -> [ProcessedImage] {
var results: [ProcessedImage] = []
for image in images {
results.append(expensiveProcess(image)) // Reallocations!
}
return results
}
Solution:
func processImages(_ images: [UIImage]) -> [ProcessedImage] {
var results = ContiguousArray<ProcessedImage>()
results.reserveCapacity(images.count) // Single allocation
for image in images {
results.append(expensiveProcess(image))
}
return Array(results)
}
Result: 30s → 8s (73% faster) by eliminating reallocations.
Example 2: Generic Specialization
Problem: Protocol-based rendering is slow.
Investigation:
// Original - existential overhead
func render(shapes: [any Shape]) {
for shape in shapes {
shape.draw() // Dynamic dispatch
}
}
Solution:
// Specialized generic
func render<S: Shape>(shapes: [S]) {
for shape in shapes {
shape.draw() // Static dispatch after specialization
}
}
// Or use @_specialize
@_specialize(where S == Circle)
@_specialize(where S == Rectangle)
func render<S: Shape>(shapes: [S]) { }
Result: 100ms → 10ms (10x faster) by eliminating witness table overhead.
Resources
WWDC: 2025-312, 2024-10217, 2024-10170, 2021-10216, 2016-416
Docs: /swift/inlinearray, /swift/span, /swift/outputspan
Skills: axiom-performance-profiling, axiom-swift-concurrency, axiom-swiftui-performance