r-performance

SKILL.md

R Performance Best Practices

Profiling, benchmarking, and optimization strategies for R code

Performance Tool Selection Guide

When to Use Each Performance Tool

Profiling Tools Decision Matrix

Tool Use When Don't Use When What It Shows
profvis Complex code, unknown bottlenecks Simple functions, known issues Time per line, call stack
bench::mark() Comparing alternatives Single approach Relative performance, memory
system.time() Quick checks Detailed analysis Total runtime only
Rprof() Base R only environments When profvis available Raw profiling data

Step-by-Step Performance Workflow

# 1. Profile first - find the actual bottlenecks
library(profvis)
profvis({
  # Your slow code here
})

# 2. Focus on the slowest parts (80/20 rule)
# Don't optimize until you know where time is spent

# 3. Benchmark alternatives for hot spots
library(bench)
bench::mark(
  current = current_approach(data),
  vectorized = vectorized_approach(data),
  parallel = map(data, in_parallel(func))
)

# 4. Consider tool trade-offs based on bottleneck type

When Each Tool Helps vs Hurts

Parallel Processing (in_parallel())

# Helps when:
# - CPU-intensive computations
# - Embarassingly parallel problems
# - Large datasets with independent operations
# - I/O bound operations (file reading, API calls)

# Hurts when:
# - Simple, fast operations (overhead > benefit)
# - Memory-intensive operations (may cause thrashing)
# - Operations requiring shared state
# - Small datasets

# Example decision point:
expensive_func <- function(x) Sys.sleep(0.1) # 100ms per call
fast_func <- function(x) x^2                 # microseconds per call

# Good for parallel
map(1:100, in_parallel(expensive_func))  # ~10s -> ~2.5s on 4 cores

# Bad for parallel (overhead > benefit)
map(1:100, in_parallel(fast_func))       # 100us -> 50ms (500x slower!)

vctrs Backend Tools

# Use vctrs when:
# - Type safety matters more than raw speed
# - Building reusable package functions
# - Complex coercion/combination logic
# - Consistent behavior across edge cases

# Avoid vctrs when:
# - One-off scripts where speed matters most
# - Simple operations where base R is sufficient
# - Memory is extremely constrained

# Decision point:
simple_combine <- function(x, y) c(x, y)           # Fast, simple
robust_combine <- function(x, y) vec_c(x, y)      # Safer, slight overhead

# Use simple for hot loops, robust for package APIs

Data Backend Selection

# Use data.table when:
# - Very large datasets (>1GB)
# - Complex grouping operations
# - Reference semantics desired
# - Maximum performance critical

# Use dplyr when:
# - Readability and maintainability priority
# - Complex joins and window functions
# - Team familiarity with tidyverse
# - Moderate sized data (<100MB)

# Use base R when:
# - No dependencies allowed
# - Simple operations
# - Teaching/learning contexts

Profiling Best Practices

# 1. Profile realistic data sizes
profvis({
  # Use actual data size, not toy examples
  real_data |> your_analysis()
})

# 2. Profile multiple runs for stability
bench::mark(
  your_function(data),
  min_iterations = 10,  # Multiple runs
  max_iterations = 100
)

# 3. Check memory usage too
bench::mark(
  approach1 = method1(data),
  approach2 = method2(data),
  check = FALSE,  # If outputs differ slightly
  filter_gc = FALSE  # Include GC time
)

# 4. Profile with realistic usage patterns
# Not just isolated function calls

Performance Anti-Patterns to Avoid

# Don't optimize without measuring
# BAD: "This looks slow" -> immediately rewrite
# GOOD: Profile first, optimize bottlenecks

# Don't over-engineer for performance
# BAD: Complex optimizations for 1% gains
# GOOD: Focus on algorithmic improvements

# Don't assume - measure
# BAD: "for loops are always slow in R"
# GOOD: Benchmark your specific use case

# Don't ignore readability costs
# BAD: Unreadable code for minor speedups
# GOOD: Readable code with targeted optimizations

Backend Tools for Performance

  • Consider lower-level tools when speed is critical
  • Use vctrs, rlang backends when appropriate
  • Profile to identify true bottlenecks
# For packages - consider backend tools
# vctrs for type-stable vector operations
# rlang for metaprogramming
# data.table for large data operations

When to Use vctrs

Core Benefits

  • Type stability - Predictable output types regardless of input values
  • Size stability - Predictable output sizes from input sizes
  • Consistent coercion rules - Single set of rules applied everywhere
  • Robust class design - Proper S3 vector infrastructure

Use vctrs when

Building Custom Vector Classes

# Good - vctrs-based vector class
new_percent <- function(x = double()) {
  vec_assert(x, double())
  new_vctr(x, class = "pkg_percent")
}

# Automatic data frame compatibility, subsetting, etc.

Type-Stable Functions in Packages

# Good - Guaranteed output type
my_function <- function(x, y) {
  # Always returns double, regardless of input values
  vec_cast(result, double())
}

# Avoid - Type depends on data
sapply(x, function(i) if(condition) 1L else 1.0)

Consistent Coercion/Casting

# Good - Explicit casting with clear rules
vec_cast(x, double())  # Clear intent, predictable behavior

# Good - Common type finding
vec_ptype_common(x, y, z)  # Finds richest compatible type

# Avoid - Base R inconsistencies
c(factor("a"), "b")  # Unpredictable behavior

Size/Length Stability

# Good - Predictable sizing
vec_c(x, y)  # size = vec_size(x) + vec_size(y)
vec_rbind(df1, df2)  # size = sum of input sizes

# Avoid - Unpredictable sizing
c(env_object, function_object)  # Unpredictable length

vctrs vs Base R Decision Matrix

Use Case Base R vctrs When to Choose vctrs
Simple combining c() vec_c() Need type stability, consistent rules
Custom classes S3 manually new_vctr() Want data frame compatibility, subsetting
Type conversion as.*() vec_cast() Need explicit, safe casting
Finding common type Not available vec_ptype_common() Combining heterogeneous inputs
Size operations length() vec_size() Working with non-vector objects

Implementation Patterns

Basic Vector Class

# Constructor (low-level)
new_percent <- function(x = double()) {
  vec_assert(x, double())
  new_vctr(x, class = "pkg_percent")
}

# Helper (user-facing)
percent <- function(x = double()) {
  x <- vec_cast(x, double())
  new_percent(x)
}

# Format method
format.pkg_percent <- function(x, ...) {
  paste0(vec_data(x) * 100, "%")
}

Coercion Methods

# Self-coercion
vec_ptype2.pkg_percent.pkg_percent <- function(x, y, ...) {
  new_percent()
}

# With double
vec_ptype2.pkg_percent.double <- function(x, y, ...) double()
vec_ptype2.double.pkg_percent <- function(x, y, ...) double()

# Casting
vec_cast.pkg_percent.double <- function(x, to, ...) {
  new_percent(x)
}
vec_cast.double.pkg_percent <- function(x, to, ...) {
  vec_data(x)
}

Performance Considerations

When vctrs Adds Overhead

  • Simple operations - vec_c(1, 2) vs c(1, 2) for basic atomic vectors
  • One-off scripts - Type safety less critical than speed
  • Small vectors - Overhead may outweigh benefits

When vctrs Improves Performance

  • Package functions - Type stability prevents expensive re-computation
  • Complex classes - Consistent behavior reduces debugging
  • Data frame operations - Robust column type handling
  • Repeated operations - Predictable types enable optimization

Package Development Guidelines

Exports and Dependencies

# DESCRIPTION - Import specific functions
Imports: vctrs

# NAMESPACE - Import what you need
importFrom(vctrs, vec_assert, new_vctr, vec_cast, vec_ptype_common)

# Or if using extensively
import(vctrs)

Testing vctrs Classes

# Test type stability
test_that("my_function is type stable", {
  expect_equal(vec_ptype(my_function(1:3)), vec_ptype(double()))
  expect_equal(vec_ptype(my_function(integer())), vec_ptype(double()))
})

# Test coercion
test_that("coercion works", {
  expect_equal(vec_ptype_common(new_percent(), 1.0), double())
  expect_error(vec_ptype_common(new_percent(), "a"))
})

Don't Use vctrs When

  • Simple one-off analyses - Base R is sufficient
  • No custom classes needed - Standard types work fine
  • Performance critical + simple operations - Base R may be faster
  • External API constraints - Must return base R types

The key insight: vctrs is most valuable in package development where type safety, consistency, and extensibility matter more than raw speed for simple operations.

Performance Migrations

# Old -> New performance patterns
for loops for parallelizable work -> map(data, in_parallel(f))
Manual type checking             -> vec_assert() / vec_cast()
Inconsistent coercion           -> vec_ptype_common() / vec_c()
Weekly Installs
6
GitHub Stars
55
First Seen
10 days ago
Installed on
opencode6
gemini-cli6
github-copilot6
codex6
kimi-cli6
amp6