r-guide
SKILL.md
R Guide
Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps
Core Principles
- Tidyverse First: Use tidyverse conventions for data manipulation, visualization, and functional programming; fall back to base R only when performance demands it
- Vectorize Everything: Prefer vectorized operations and
purrr::map()over explicitforloops; R is optimized for vector operations - Reproducibility: Every analysis must be reproducible -- use
renvfor dependency management,set.seed()for stochastic operations, and R Markdown/Quarto for literate programming - Functional Style: Write pure functions with no side effects; avoid modifying global state or relying on
.GlobalEnv - Explicit Over Implicit: No reliance on partial matching, implicit type coercion, or positional argument passing for non-trivial functions
Guardrails
Version & Dependencies
- Target R 4.1+ (native pipe
|>, lambda shorthand\(x)) - Manage dependencies with
renv-- always commitrenv.lock - For packages, declare all dependencies in
DESCRIPTION(Imports:,Suggests:) - Pin CRAN snapshot dates in
renvfor full reproducibility - Audit new dependencies: check CRAN status, reverse dependencies, license (GPL compatibility)
Code Style
- Follow the tidyverse style guide
- Run
styler::style_pkg()andlintr::lint_package()before every commit - Naming:
snake_casefor functions/variables,PascalCasefor R6/S4 classes - Max line length: 80 characters
- Use
<-for assignment (not=outside function arguments) - Explicit
library()at top of scripts; never userequire() - Always use
TRUE/FALSE(neverT/F-- they can be overwritten) - No
attach()orsetwd()-- usehere::here()for project-relative paths
Vectorization
- Prefer vectorized operations:
x * 2notfor (i in seq_along(x)) x[i] * 2 - Use
dplyr::mutate()/dplyr::summarise()for column-wise transformations - Use
purrr::map()family for list iteration (map_dbl(),map_chr(),map_dfr()) - Use
dplyr::across()for applying functions to multiple columns - Reserve
forloops for side effects only (writing files, API calls) - Use
vapply()oversapply()when base R is required (explicit return type)
Error Handling
- Use
rlang::abort()/cli::cli_abort()overstop()for structured conditions - Validate inputs at the start of every exported function
- Use
stopifnot()orrlang::arg_match()for argument validation - Never use
try()-- alwaystryCatch()orpurrr::safely()
validate_dataframe <- function(df, required_cols) {
if (!is.data.frame(df)) {
cli::cli_abort("{.arg df} must be a data frame, not {.obj_type_friendly {df}}.")
}
missing_cols <- setdiff(required_cols, names(df))
if (length(missing_cols) > 0) {
cli::cli_abort(
"Missing required column{?s}: {.field {missing_cols}}.",
class = "validation_error"
)
}
invisible(df)
}
Reproducibility
- Always use
set.seed()before stochastic operations; document the seed - Use
renv::snapshot()after adding or updating packages - Never use absolute paths -- use
here::here()for project-relative paths - Use R Markdown (
.Rmd) or Quarto (.qmd) for analysis reports - Include
sessioninfo::session_info()at the end of reports
Project Structure
mypackage/ myanalysis/
├── R/ # Source files ├── R/ # Reusable functions
│ ├── data-clean.R ├── analysis/ # Rmd/Quarto (numbered)
│ └── utils.R │ ├── 01-exploration.Rmd
├── tests/ │ └── 02-modeling.qmd
│ ├── testthat.R # Runner ├── data/
│ └── testthat/ │ ├── raw/ # Immutable input
│ └── test-data-clean.R │ └── processed/ # Generated output
├── man/ # roxygen2 ├── output/ # Figures, reports
├── vignettes/ ├── tests/testthat/
├── data-raw/ # Data scripts ├── renv.lock
├── DESCRIPTION └── README.md
├── NAMESPACE # roxygen2
├── renv.lock
└── README.md
- Use
roxygen2for all docs; never editman/orNAMESPACEby hand - Raw data is immutable -- store in
data/raw/, process intodata/processed/
Key Patterns
Tidyverse Pipe Chains
# Prefer native pipe |> (R 4.1+) over magrittr %>%
result <- raw_data |>
dplyr::filter(year >= 2020, !is.na(revenue)) |>
dplyr::mutate(
revenue_m = revenue / 1e6,
growth = (revenue - dplyr::lag(revenue)) / dplyr::lag(revenue)
) |>
dplyr::summarise(
mean_revenue = mean(revenue_m, na.rm = TRUE),
.by = region
)
Tidy Evaluation
# Use {{ }} (embrace) for column names passed as arguments
summarise_by <- function(df, group_col, value_col) {
df |>
dplyr::summarise(
mean_val = mean({{ value_col }}, na.rm = TRUE),
n = dplyr::n(),
.by = {{ group_col }}
)
}
# Use .data pronoun for string column references
filter_column <- function(df, col_name, threshold) {
df |> dplyr::filter(.data[[col_name]] > threshold)
}
# Use across() for multiple columns
standardize_numeric <- function(df) {
df |>
dplyr::mutate(dplyr::across(
where(is.numeric),
\(x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
))
}
ggplot2 Grammar of Graphics
plot_distribution <- function(df, x_col, fill_col = NULL) {
ggplot2::ggplot(df, ggplot2::aes(x = {{ x_col }})) +
ggplot2::geom_histogram(ggplot2::aes(fill = {{ fill_col }}), bins = 30, alpha = 0.7) +
ggplot2::labs(title = "Distribution", x = NULL, y = "Count") +
ggplot2::theme_minimal(base_size = 14)
}
Functional Programming with purrr
# Type-stable map variants -- read and combine CSV files
results <- purrr::map_dfr(file_paths, \(path) {
readr::read_csv(path, show_col_types = FALSE) |>
dplyr::mutate(source_file = basename(path))
})
# Safe execution -- capture errors without stopping
safe_read <- purrr::safely(readr::read_csv)
reads <- purrr::map(file_paths, safe_read)
successes <- purrr::map(purrr::keep(reads, \(x) is.null(x$error)), "result")
Testing
Standards
- Use
testthat3rd edition (Config/testthat/edition: 3inDESCRIPTION) - Test files:
test-*.R(mirror source:data-clean.R->test-data-clean.R) - Test names describe behavior:
test_that("filter_active removes inactive users", ...) - Coverage target: >80% for business logic, >60% overall (measured with
covr) - Use snapshot tests (
expect_snapshot()) for complex output (plots, printed tables) - No test interdependencies -- each
test_that()block is self-contained - Use
withr::local_*()for temporary state changes (env vars, options, files)
testthat Examples
test_that("summarise_by computes correct group means", {
df <- tibble::tibble(
region = c("east", "east", "west", "west"),
revenue = c(100, 200, 300, 400)
)
result <- summarise_by(df, region, revenue)
expect_equal(nrow(result), 2)
expect_equal(result$mean_val[result$region == "east"], 150)
})
test_that("validate_dataframe errors on missing columns", {
df <- tibble::tibble(a = 1, b = 2)
expect_error(validate_dataframe(df, c("a", "c")), class = "validation_error")
})
Tooling
Essential Commands
Rscript -e 'styler::style_pkg()' # Format package code
Rscript -e 'lintr::lint_package()' # Lint package
Rscript -e 'devtools::test()' # Run tests
Rscript -e 'covr::package_coverage()' # Coverage report
Rscript -e 'devtools::check()' # Full R CMD check
Rscript -e 'renv::snapshot()' # Lock dependencies
Rscript -e 'devtools::document()' # Rebuild roxygen2 docs
quarto render analysis/report.qmd # Render Quarto document
References
For detailed patterns and examples, see:
- references/patterns.md -- dplyr pipelines, ggplot2 recipes, purrr functional patterns
External References
Weekly Installs
6
Repository
ar4mirez/samuelGitHub Stars
3
First Seen
Mar 1, 2026
Security Audits
Installed on
opencode6
gemini-cli6
github-copilot6
amp6
cline6
codex6