writing-tidyverse-r
Writing Tidyverse R
This skill covers modern tidyverse patterns for R 4.3+ and dplyr 1.1+, style guidelines, and migration from legacy patterns.
Core Principles
- Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs
- Write readable code first - Optimize only when necessary
- Follow tidyverse style guide - Consistent naming, spacing, and structure
Pipe Usage
Always use native pipe |> instead of magrittr %>%
R 4.3+ provides all needed features. See pipe-examples.md for usage patterns.
Join Syntax (dplyr 1.1+)
Use join_by() instead of character vectors for joins
Modern join syntax supports:
- Equality joins:
join_by(company == id) - Inequality joins:
join_by(company == id, year >= since) - Rolling joins:
join_by(company == id, closest(year >= since))
See join-examples.md for complete patterns.
Multiple Match Handling
Use multiple and unmatched arguments for quality control:
multiple = "error"- Expect 1:1 matchesmultiple = "all"- Allow multiple matches explicitlyunmatched = "error"- Ensure all rows match
Data Masking vs Tidy Selection
Understand the difference:
- Data masking functions:
arrange(),filter(),mutate(),summarise() - Tidy selection functions:
select(),relocate(),across()
Key patterns:
- Use
{{}}(embrace) for function arguments - Use
.data[[]]for character vectors - Use
across()for multiple columns
See data-masking-examples.md for patterns.
Modern Grouping and Column Operations
Use .by for per-operation grouping (dplyr 1.1+)
This replaces the old group_by() |> ... |> ungroup() pattern.
Additional modern operations:
pick()- Column selection inside data-masking functionsacross()- Apply functions to multiple columnsreframe()- Multi-row summaries
See grouping-examples.md for complete examples.
String Manipulation with stringr
Use stringr over base R string functions
Benefits:
- Consistent
str_prefix - String-first argument order
- Pipe-friendly and vectorized
See stringr-examples.md for common patterns and base R equivalents.
Style Guide Essentials
Object Names
- Use snake_case for all names
- Variable names = nouns, function names = verbs
- Avoid dots except for S3 methods
Good: day_one, calculate_mean, user_data
Avoid: DayOne, calculate.mean, userData
Spacing and Layout
See style-examples.md for proper spacing and pipe formatting.
Naming and Arguments
- Use snake_case for variables and functions
- Prefix non-standard arguments with
.(e.g.,.data,.by)
Anti-Patterns to Avoid
Legacy Patterns
| Avoid | Use Instead |
|---|---|
%>% |
` |
by = c("a" = "b") |
by = join_by(a == b) |
sapply() |
map_*() |
| `group_by() | > ... |
Performance Anti-Patterns
- Don't grow objects in loops - Pre-allocate or use purrr
- Don't use
sapply()- Type-unstable, usemap_*()instead
See anti-patterns.md for examples of what to avoid and correct alternatives.
Migration Reference
Base R to Modern Tidyverse
| Base R | Modern Tidyverse |
|---|---|
subset(data, condition) |
filter(data, condition) |
data[order(data$x), ] |
arrange(data, x) |
aggregate(x ~ y, data, mean) |
summarise(data, mean(x), .by = y) |
sapply(x, f) |
map(x, f) |
grepl("pattern", text) |
str_detect(text, "pattern") |
gsub("old", "new", text) |
str_replace_all(text, "old", "new") |
Old to New Tidyverse Patterns
| Old Pattern | New Pattern |
|---|---|
data %>% function() |
`data |
| `group_by(x) | > summarise() |
by = c("a" = "b") |
by = join_by(a == b) |
gather()/spread() |
pivot_longer()/pivot_wider() |
map_dfr(x, f) |
`map(x, f) |
separate(col, into = ...) |
separate_wider_delim() |
See migration-examples.md for complete migration patterns.
source: Sarah Johnson's gist https://gist.github.com/sj-io/3828d64d0969f2a0f05297e59e6c15ad