Julia Performance Optimization

Apply optimizations in this order — measure first, then fix.

Step 1: Measure before optimizing

using BenchmarkTools

@btime my_function(args...)        # quick measurement
@benchmark my_function(args...)    # full stats with distribution

Key fields in @benchmark output:

median time — use median, not mean (robust to outliers)
allocs — high alloc count suggests type instability
memory estimate — unexpectedly large → unnecessary array copies

Step 2: Fix type instability

Julia's JIT compiler can only optimize when types are known at compile time.

# Quick check (red ::Any = type-unstable)
@code_warntype my_function(args...)

# Deeper: inspect the full call graph
@code_llvm my_function(args...)

JET.jl — automated type analysis

JET traverses the entire call graph and detects runtime dispatch automatically. More powerful than @code_warntype for complex code.

using JET

@report_opt my_function(args...)   # find runtime dispatch
@report_call my_function(args...)  # find type errors

# Whole-file analysis
report_file("scripts/explore.jl"; analyzer=JET.OptAnalyzer)

Recommended workflow: @report_opt first (fix dispatch) → @report_call (fix errors)

⚠️ JET v0.11 requires Julia 1.12. ] add JET picks the right version automatically.

Common type instability patterns

# ❌ Non-const global variable
x = 1.0
f() = x * 2

# ✅ Use const or pass as argument
const x = 1.0
f(x) = x * 2

# ❌ Return type changes in branches
g(flag) = flag ? 1 : 1.0   # Int vs Float64

# ✅ Unify return types
g(flag) = flag ? 1.0 : 1.0

Step 3: Reduce memory allocations

# ❌ Allocate inside loop
for i in 1:1000
    tmp = zeros(100)
end

# ✅ Pre-allocate and reuse
tmp = zeros(100)
for i in 1:1000
    fill!(tmp, 0)
end

# ✅ Use in-place operations (! functions)
mul!(C, A, B)           # C = A*B without allocation
broadcast!(f, dst, src)

# ✅ Avoid slice copies with @views
f(@view A[1:100, :])

Step 4: Array access and loop patterns

# ✅ Julia is column-major — loop columns in outer loop
for j in 1:m, i in 1:n
    A[i, j] = ...
end

# ✅ Small fixed-size arrays → StaticArrays
using StaticArrays
v = SVector{3, Float64}(1.0, 2.0, 3.0)

# ✅ Skip bounds checks (only after verifying correctness)
@inbounds for x in A
    s += x
end

# ✅ Explicit SIMD with LoopVectorization
using LoopVectorization
@turbo for i in eachindex(A)
    A[i] = sqrt(A[i])
end

Step 5: Profile to find hotspots

using Profile, ProfileView

@profile my_heavy_function()
ProfileView.view()    # flamegraph (requires ] add ProfileView)
Profile.print()       # text output

Step 6: Parallelism (only when steps 1–5 are exhausted)

# Multi-threading (launch with julia -t 4)
Threads.@threads for i in 1:n
    result[i] = heavy_compute(i)
end

Checklist

Check	Tool
Find bottleneck	`@benchmark`
Type instability (quick)	`@code_warntype`
Type instability (full graph)	`JET.@report_opt`
Type errors	`JET.@report_call`
Excess allocations	`@benchmark` allocs field
Column-major access	code review
Global variables	code review → `const`
Slice copies	`@views`

References

Book	Author	Notes
Julia High Performance 2nd ed. (2019)	Avik Sengupta (Packt)	The standard reference for Julia optimization
Hands-on Design Patterns with Julia (2020)	Tom Kwong (Packt)	Performance-aware design patterns
Practical Julia (2023)	Lee Phillips (No Starch)	Scientific computing focus

Julia Performance Tips (official docs) ← read this first

julia-performance