Test-Driven Development Workflow for R

This skill ensures all R code development follows TDD principles with comprehensive test coverage using testthat.

When to Activate

Writing new functions or features
Fixing bugs or issues
Refactoring existing code
Adding new model types
Creating data processing pipelines
Building Shiny components

Getting Started

Initialize testing infrastructure for your package:

# Set up testthat (Edition 3)
usethis::use_testthat(3)

# Create a test file for an existing source file
usethis::use_test("function_name")

# Or create test and source file together
usethis::use_r("function_name")
usethis::use_test("function_name")

Core Principles

1. Tests BEFORE Code

ALWAYS write tests first, then implement code to make tests pass.

2. Coverage Requirements

Minimum 80% coverage (unit + integration)
100% coverage for statistical calculations
100% coverage for data validation
All edge cases covered
Error scenarios tested

3. Test Types

Tests follow a three-level hierarchy: File → Test → Expectation

Unit Tests

Individual functions and utilities:

test_that("rescale01 normalizes to [0, 1] range", {
  expect_equal(rescale01(c(0, 5, 10)), c(0, 0.5, 1))
  expect_equal(rescale01(c(-10, 0, 10)), c(0, 0.5, 1))
})

test_that("rescale01 handles edge cases", {
  expect_equal(rescale01(c(5, 5, 5)), c(NaN, NaN, NaN))
  expect_equal(rescale01(numeric(0)), numeric(0))
  expect_equal(rescale01(c(0, NA, 10)), c(0, NA, 1))
})

Integration Tests

Function interactions and workflows:

test_that("data pipeline produces expected output", {
  raw_data <- read_fixture("sample_input.csv")

  result <- raw_data |>
    clean_data() |>
    transform_features() |>
    summarize_results()

  expect_s3_class(result, "tbl_df")
  expect_named(result, c("group", "mean", "sd", "n"))
  expect_true(all(result$n > 0))
})

Snapshot Tests

For complex outputs that are hard to specify:

test_that("model summary format is stable", {
  model <- fit_model(test_data)
  expect_snapshot(print(summary(model)))
})

test_that("error messages are informative", {
  expect_snapshot(
    validate_input(invalid_data),
    error = TRUE
  )
})

Snapshot workflow:

# Review snapshot changes
testthat::snapshot_review("test_name")

# Accept snapshot changes
testthat::snapshot_accept("test_name")

Snapshots are stored in tests/testthat/_snaps/ directory.

BDD Alternative (Optional)

For behavior-driven development, use describe() and it():

describe("matrix()", {
  it("can be multiplied by a scalar", {
    m1 <- matrix(1:4, 2, 2)
    m2 <- m1 * 2
    expect_equal(matrix(c(2, 4, 6, 8), 2, 2), m2)
  })

  it("can be transposed", {
    m <- matrix(1:4, 2, 2)
    expect_equal(t(m), matrix(c(1, 3, 2, 4), 2, 2))
  })
})

Key distinction: "describe() verifies you implement the right things, test_that() ensures you do things right."

Test Design Principles

Self-Sufficient Tests

Each test should contain all setup, execution, and teardown code. Tests must be independent and runnable in isolation without relying on ambient state or prior test execution.

# GOOD: Self-contained
test_that("function works with specific data", {
  data <- tibble(x = 1:10, y = rnorm(10))  # Setup
  result <- my_function(data)               # Execute
  expect_equal(nrow(result), 10)            # Assert
})

# BAD: Depends on external state
# setup_data <- tibble(...)  # Created outside test
test_that("function works", {
  result <- my_function(setup_data)  # Relies on external data
  expect_equal(nrow(result), 10)
})

Duplication Over Factoring

Repetition is acceptable in tests—duplicate setup code rather than extracting it elsewhere. Clarity outweighs avoiding duplication.

# GOOD: Duplicated but clear
test_that("clean_data handles missing values", {
  data <- tibble(x = c(1, NA, 3), y = c(4, 5, 6))
  result <- clean_data(data)
  expect_equal(nrow(result), 2)
})

test_that("clean_data handles invalid values", {
  data <- tibble(x = c(1, -999, 3), y = c(4, 5, 6))
  result <- clean_data(data, invalid = -999)
  expect_equal(nrow(result), 2)
})

# ACCEPTABLE: Each test is self-contained and readable

Plan for Failure

Write tests assuming they'll fail and require debugging. Make logic explicit and obvious. Run tests in fresh R sessions independently.

Use devtools::load_all()

During development, prefer devtools::load_all() over library(). This:

Exposes unexported functions for testing
Automatically attaches testthat
Eliminates unnecessary library() calls in tests
Simulates package loading without installation

testthat Edition 3

Edition 3 provides improved snapshot testing, better diffs via waldo, unified condition handling, parallel execution support, and byte-compiled code compatibility for mocking.

Deprecated Patterns → Modern Alternatives

# DEPRECATED: context() calls
context("Data validation")  # Remove - filename serves this purpose

# DEPRECATED: expect_equivalent()
expect_equivalent(x, y)
# MODERN:
expect_equal(x, y, ignore_attr = TRUE)

# DEPRECATED: with_mock()
with_mock(external_call = function() "mocked", {
  result <- my_function()
})
# MODERN:
local_mocked_bindings(
  external_call = function() "mocked"
)
result <- my_function()

# DEPRECATED: expect_is()
expect_is(x, "data.frame")
# MODERN:
expect_s3_class(x, "data.frame")

Initialize Edition 3

In DESCRIPTION, ensure:

Config/testthat/edition: 3

Or initialize with:

usethis::use_testthat(3)

Essential Expectations Reference

Equality & Identity

expect_equal(x, y)              # With numeric tolerance
expect_equal(x, y, tolerance = 0.001)
expect_equal(x, y, ignore_attr = TRUE)
expect_identical(x, y)          # Exact match required
expect_all_equal(x)             # Every element equal (v3.3.0+)

Conditions

expect_error(code)
expect_error(code, "pattern")
expect_error(code, class = "validation_error")
expect_warning(code)
expect_no_warning(code)
expect_message(code)
expect_no_message(code)

Collections & Sets

expect_setequal(x, y)          # Same elements, any order
expect_contains(set, element)  # Subset relationship (v3.2.0+)
expect_in(element, set)        # Membership check (v3.2.0+)
expect_disjoint(set1, set2)    # No overlap (v3.3.0+)
expect_named(x, c("a", "b"))   # Named vector/list

Type & Structure

expect_type(x, "double")
expect_s3_class(x, "data.frame")
expect_s4_class(x, "S4Class")
expect_r6_class(x, "R6Class")
expect_shape(matrix, c(2, 3))  # Matrix/array dimensions (v3.3.0+)
expect_length(x, 10)

Logical

expect_true(x)
expect_false(x)
expect_all_true(x)             # Every element TRUE (v3.3.0+)
expect_all_false(x)            # Every element FALSE (v3.3.0+)

Other Useful Expectations

expect_null(x)
expect_invisible(result)
expect_output(print(x), "pattern")
expect_snapshot(complex_output)

File Organization

Tests mirror your package structure:

tests/
├── testthat/
│   ├── test-validation.R      # Tests for R/validation.R
│   ├── test-processing.R      # Tests for R/processing.R
│   ├── test-models.R          # Tests for R/models.R
│   ├── test-output.R          # Tests for R/output.R
│   ├── helper-fixtures.R      # Shared functions (sourced before tests)
│   ├── setup-database.R       # Setup code (runs during R CMD check)
│   ├── helper-expectations.R  # Custom expectations
│   └── fixtures/              # Static test data files
│       ├── sample_input.csv
│       └── expected_output.rds
└── testthat.R                 # Test runner

File Types

test-*.R - Actual test files (paired with source files)
helper-*.R - Shared utility functions, sourced before tests run
setup-*.R - Setup code that runs only during R CMD check
fixtures/ - Static test data, accessed via test_path("fixtures/file")

Access fixtures:

test_path("fixtures", "sample_data.csv")

TDD Workflow Steps

Step 1: Define Expected Behavior

Document what the function should do:

# Function: calculate_ci
# Purpose: Calculate bootstrap confidence intervals
# Inputs:
#   - data: numeric vector
#   - conf_level: confidence level (default 0.95)
#   - n_boot: number of bootstrap samples (default 1000)
# Outputs:
#   - Named numeric vector with lower and upper bounds
# Edge cases:
#   - Handle NA values
#   - Error on non-numeric input
#   - Error on empty input

Step 2: Write Failing Tests

# tests/testthat/test-calculate_ci.R
library(testthat)

test_that("calculate_ci returns correct structure", {
  set.seed(123)
  result <- calculate_ci(1:100)

  expect_type(result, "double")
  expect_named(result, c("lower", "upper"))
  expect_true(result["lower"] < result["upper"])
})

test_that("calculate_ci respects confidence level", {
  set.seed(123)
  ci_95 <- calculate_ci(1:100, conf_level = 0.95)
  ci_99 <- calculate_ci(1:100, conf_level = 0.99)

  # 99% CI should be wider
  expect_true(ci_99["upper"] - ci_99["lower"] > ci_95["upper"] - ci_95["lower"])
})

test_that("calculate_ci handles NA values", {
  set.seed(123)
  result <- calculate_ci(c(1:100, NA, NA))

  expect_false(any(is.na(result)))
})

test_that("calculate_ci validates inputs", {
  expect_error(calculate_ci("not numeric"), class = "validation_error")
  expect_error(calculate_ci(numeric(0)), class = "validation_error")
  expect_error(calculate_ci(1:10, conf_level = 1.5), class = "validation_error")
})

Step 3: Run Tests (They Should Fail)

devtools::test()
# ✖ calculate_ci returns correct structure
# ✖ calculate_ci respects confidence level
# ✖ calculate_ci handles NA values
# ✖ calculate_ci validates inputs

Step 4: Implement Minimal Code

# R/calculate_ci.R

#' Calculate Bootstrap Confidence Interval
#'
#' @param x Numeric vector
#' @param conf_level Confidence level (default 0.95)
#' @param n_boot Number of bootstrap samples (default 1000)
#' @return Named numeric vector with lower and upper bounds
#' @export
calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) {
  # Validate inputs
  if (!is.numeric(x)) {
    cli::cli_abort("{.arg x} must be numeric", class = "validation_error")
  }
  if (length(x) == 0) {
    cli::cli_abort("{.arg x} cannot be empty", class = "validation_error")
  }
  if (conf_level <= 0 || conf_level >= 1) {
    cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error")
  }

  # Remove NA values
  x <- x[!is.na(x)]

  # Bootstrap
  boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

  # Calculate quantiles
  alpha <- 1 - conf_level
  c(
    lower = unname(quantile(boot_means, alpha / 2)),
    upper = unname(quantile(boot_means, 1 - alpha / 2))
  )
}

Step 5: Run Tests Again

devtools::test()
# ✔ calculate_ci returns correct structure
# ✔ calculate_ci respects confidence level
# ✔ calculate_ci handles NA values
# ✔ calculate_ci validates inputs

Step 6: Refactor

Improve while keeping tests green:

# Extract validation to helper
validate_ci_inputs <- function(x, conf_level) {
  if (!is.numeric(x)) {
    cli::cli_abort("{.arg x} must be numeric", class = "validation_error")
  }
  if (length(x) == 0) {
    cli::cli_abort("{.arg x} cannot be empty", class = "validation_error")
  }
  if (conf_level <= 0 || conf_level >= 1) {
    cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error")
  }
}

calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) {
  validate_ci_inputs(x, conf_level)

  x <- x[!is.na(x)]
  boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

  alpha <- 1 - conf_level
  c(
    lower = unname(quantile(boot_means, alpha / 2)),
    upper = unname(quantile(boot_means, 1 - alpha / 2))
  )
}

Step 7: Verify Coverage

covr::package_coverage()
# calculate_ci.R: 100%

Testing Patterns

Testing Data Transformations

test_that("clean_data removes invalid rows", {
  input <- tibble(
    id = 1:4,
    value = c(1, NA, 3, -999)
  )

  result <- clean_data(input, invalid_value = -999)

  expect_equal(nrow(result), 2)
  expect_equal(result$id, c(1, 3))
  expect_false(anyNA(result$value))
})

Testing Statistical Functions

test_that("weighted_mean matches manual calculation", {
  x <- c(1, 2, 3)
  w <- c(1, 2, 1)

  result <- weighted_mean(x, w)
  expected <- sum(x * w) / sum(w)  # (1 + 4 + 3) / 4 = 2

  expect_equal(result, expected)
})

Testing with Fixtures

# helper-fixtures.R
read_fixture <- function(name) {
  path <- testthat::test_path("fixtures", name)
  readr::read_csv(path, show_col_types = FALSE)
}

# test-pipeline.R
test_that("pipeline handles real data", {
  input <- read_fixture("sample_data.csv")
  result <- process_pipeline(input)

  expect_snapshot(result)
})

Mocking External Dependencies

test_that("fetch_data handles API errors", {
  # Mock the API call
  local_mocked_bindings(
    httr2_request = function(...) {
      stop("API unavailable")
    }
  )

  expect_error(
    fetch_data("endpoint"),
    "API unavailable"
  )
})

Using withr for Cleanup

Use withr functions to manage temporary state with automatic restoration:

test_that("function respects options", {
  # Temporarily set options
  withr::local_options(list(digits = 2))

  result <- format_number(3.14159)
  expect_equal(result, "3.14")
})

test_that("function writes to temp file", {
  # Create temp file that's automatically cleaned up
  tmp <- withr::local_tempfile(lines = c("line 1", "line 2"))

  result <- process_file(tmp)
  expect_equal(result$n_lines, 2)
})

test_that("function uses custom environment variable", {
  # Temporarily set env var
  withr::local_envvar(MY_VAR = "test_value")

  result <- get_config()
  expect_equal(result$my_var, "test_value")
})

Test Data Strategies

Choose the appropriate approach for your testing needs:

1. Constructor Functions

Create data on-demand with helper functions:

# helper-data.R
make_sample_data <- function(n = 100) {
  tibble(
    id = 1:n,
    group = sample(c("A", "B"), n, replace = TRUE),
    value = rnorm(n)
  )
}

# test-analysis.R
test_that("analysis handles grouped data", {
  data <- make_sample_data(n = 50)
  result <- analyze_groups(data)
  expect_s3_class(result, "tbl_df")
})

2. Local Functions with Cleanup

Handle side effects using withr:

test_that("function reads CSV correctly", {
  # Create temp file with cleanup
  tmp <- withr::local_tempfile(fileext = ".csv")
  write.csv(mtcars, tmp, row.names = FALSE)

  result <- read_and_process(tmp)
  expect_equal(nrow(result), 32)
})

3. Static Fixtures

Store data files in fixtures/ directory:

# Store in: tests/testthat/fixtures/sample_data.csv

test_that("function handles real data format", {
  path <- test_path("fixtures", "sample_data.csv")
  data <- read_csv(path)
  result <- process_data(data)
  expect_true(all(result$valid))
})

Common Testing Mistakes to Avoid

WRONG: Testing Implementation Details

# Don't test internal state
expect_equal(obj$internal_cache, expected_cache)

CORRECT: Test Behavior

# Test observable behavior
expect_equal(get_result(obj), expected_result)

WRONG: Brittle Tests

# Breaks on any output change
expect_equal(as.character(result), "Mean: 5.234567890")

CORRECT: Flexible Assertions

# Robust to formatting changes
expect_equal(result$mean, 5.23, tolerance = 0.01)

WRONG: Dependent Tests

test_that("creates data", { global_data <<- create() })
test_that("uses data", { process(global_data) })  # Depends on previous!

CORRECT: Independent Tests

test_that("creates and uses data", {
  data <- create()
  result <- process(data)
  expect_true(is_valid(result))
})

WRONG: Modifying Tests to Pass

# When a test fails, don't change the test (unless it's wrong)
test_that("function returns 42", {
  expect_equal(my_function(), 42)  # Test fails
})

# DON'T DO THIS:
test_that("function returns 41", {
  expect_equal(my_function(), 41)  # Changed to pass - WRONG!
})

CORRECT: Fix the Implementation

# Fix the code to match expected behavior
test_that("function returns 42", {
  expect_equal(my_function(), 42)  # Test fails
})

# Fix my_function() implementation instead

When Tests Fail

Do NOT modify tests to make them pass (unless the test is wrong)
Fix the implementation to match expected behavior
Add more tests if the failure reveals missing coverage
Update snapshots only if the change is intentional

# Review and accept snapshot changes
testthat::snapshot_review("test_name")
testthat::snapshot_accept("test_name")

Coverage Verification

# Run coverage report
covr::package_coverage()

# Interactive HTML report
covr::report()

# Check specific thresholds
cov <- covr::package_coverage()
pct <- covr::percent_coverage(cov)
if (pct < 80) {
  stop("Coverage below 80%: ", round(pct, 1), "%")
}

# In testthat.R or as a coverage check
covr::package_coverage(
  type = "all",
  line_coverage = 0.80,
  function_coverage = 0.80
)

Debugging & Development

Running Tests at Different Scales

# Micro: Interactive development
devtools::load_all()
expect_equal(my_function(1), 1)  # Direct expectation

# Mezzo: Single file
testthat::test_file("tests/testthat/test-validation.R")
# RStudio: Ctrl/Cmd+Shift+T

# Macro: Full suite
devtools::test()
devtools::check()  # Full package validation

Test Reporters

# Find slow tests
devtools::test(reporter = "slow")

# Progress reporter (verbose)
devtools::test(reporter = "progress")

# Test execution order independence
devtools::test(shuffle = TRUE)

Continuous Testing

# Watch mode - auto-run on file changes
testthat::auto_test_package()

Parallel Execution (Edition 3)

Edition 3 supports parallel test execution for faster runs on multi-core systems.

Running Tests

# All tests
devtools::test()

# All tests (keyboard shortcut)
# RStudio: Ctrl/Cmd+Shift+T

# With coverage
covr::package_coverage()

# Specific file
testthat::test_file("tests/testthat/test-validation.R")

# Watch mode
testthat::auto_test_package()

# Verbose output
devtools::test(reporter = "progress")

# Find slow tests
devtools::test(reporter = "slow")

# Test independence
devtools::test(shuffle = TRUE)

# Full package check
devtools::check()

Success Metrics

80%+ code coverage achieved
All tests passing
No skipped tests
Fast execution (< 30s for unit tests)
Tests catch bugs before production
Confident refactoring enabled
Tests run independently in any order
Clear, descriptive test names
Each test validates one concept

Remember: Tests are not optional. They are the safety net that enables confident refactoring, rapid development, and production reliability. Write them FIRST.

tdd-workflow