CEDA R Style Guide
R coding standards for cedanl repositories. Based on the Tidyverse style guide with CEDA-specific conventions.
Ecosystem
Tidyverse dialect
Write R in the tidyverse dialect. Tidyverse packages share a consistent design philosophy for data manipulation, which makes code predictable and composable.
- Use
|>(base pipe) for pipelines - Prefer tidyverse functions over base R equivalents when they improve readability (e.g.,
str_detect()overgrepl(),read_csv()overread.csv()) - Use tidymodels for modeling workflows
- Use ggplot2 for visualization
Preferred packages
| Domain | Package | Notes |
|---|---|---|
| Data manipulation | dplyr, tidyr | Core pipeline |
| String operations | stringr | Consistent str_ prefix |
| Date/time | lubridate | Readable date manipulation |
| Reading data | readr, arrow | read_csv(), read_parquet() |
| Modeling | tidymodels (parsnip, recipes, workflows, yardstick) | Unified modeling interface |
| Plotting | ggplot2 | With ragg for rendering |
| Tables | flextable, gtsummary | For reports |
| Reports | quarto | .qmd templates |
| Interactive | shiny | App interface |
| CLI messages | cli | For user-facing messages |
| Error handling | rlang | rlang::abort() over stop() |
| File paths | fs | Cross-platform path handling |
| Iteration | purrr | Functional iteration (map(), walk()) |
| Data cleaning | janitor | clean_names(), tabyl() |
See Principles §11 for general dependency selection criteria.
Package Structure
Every R repo is an R package. See Project Structure for full directory layouts per repo type.
Function files
- One primary function per file, with helpers below it
- File name matches the main function:
transform_data.Rcontainstransform_data() - User-facing (exported) functions at the top of the file
- Helper functions below, ordered hierarchically
main.R
The entry point that orchestrates the pipeline:
## Load the package
devtools::load_all()
## Configuration
opleidingsnaam <- "Informatica"
opleidingsvorm <- "VT"
## Run pipeline
metadata <- read_metadata()
programs <- transform_data(metadata, opleidingsnaam, opleidingsvorm)
results <- run_analysis(programs)
render_report(results)
main.R is NOT part of the package — it's a script that uses the package.
Shiny app (interactive mode)
The Shiny app lives inside the package at inst/app/. This follows the standard CRAN pattern used by packages like esquisse, radiant, and DALEX.
# R/run_app.R
#' Launch the interactive application
#'
#' @param ... Arguments passed to [shiny::runApp()].
#' @export
run_app <- function(...) {
if (!requireNamespace("shiny", quietly = TRUE)) {
rlang::abort("Package {.pkg shiny} needed. Install: install.packages('shiny')")
}
app_dir <- system.file("app", package = "nfwa")
shiny::runApp(app_dir, ...)
}
Key conventions:
- App code in inst/app/app.R — wraps package functions with a UI
- Launch function in R/run_app.R — uses system.file() to locate the app
- shiny in Suggests: (not Imports:) — package works without Shiny installed
- config.yml in inst/app/ for local data paths
- App contains NO business logic — only UI and calls to package functions
Syntax
Assignment and pipes
# Good
students <- students_raw |>
filter(INS_Studiejaar == 2024) |>
select(INS_Studentnummer, INS_Opleidingsnaam)
# Bad
students = students_raw %>%
filter(INS_Studiejaar == 2024) %>%
select(INS_Studentnummer, INS_Opleidingsnaam)
- Use
<-for assignment, not= - Use
|>(base pipe), not%>%(magrittr pipe) - Left-hand assignment only (
x <- value, notvalue -> x) - Put the source object and first pipe on the same line
Spacing
# Good
height <- (feet * 12) + inches
df$z
x <- 1:10
# Bad
height<-feet*12+inches
df $ z
x <- 1 : 10
- Spaces around binary operators:
==,<-,+,-,*,/,|> - No spaces around unary operators:
:,::,$,@,[,[[ - Extra alignment spaces are fine for readability with
<-
Braces and code blocks
# Good
if (debug) {
show(x)
}
# Bad
if(debug){
show(x)
}
{at end of line, after a space}on its own line- Space before
(in control flow (if (,for (), but not in function calls (mean(x))
Function definitions
# Good: explicit return, named arguments
calculate_retention <- function(df,
year = 2024,
min_credits = 0,
include_masters = FALSE) {
df <- df |>
filter(INS_Studiejaar == year)
return(df)
}
# Bad: implicit return, no named arguments
f <- function(d, y, m, i) {
d |> filter(INS_Studiejaar == y)
}
- Use explicit
return()statements - Name arguments explicitly when calling functions with more than 2 arguments
- Put each argument on its own line when the signature is long
- Sensible defaults for optional arguments
Line length and structure
- Maximum line length: 100 characters
- Avoid blank lines within a pipe chain
- One blank line between code blocks
- Two blank lines before a new section
Naming
- Functions in
snake_case, English - Start with a verb:
transform_data(),create_plot(),add_ses(),get_fairness_conclusions() - Exception: Shiny modules use
camelCase(Shiny convention) - Variable names descriptive, immediately understandable, snake case
- Column names: see Data Conventions (preserve source names)
Documentation
Roxygen2 for all exported functions
#' Transform raw enrollment data for analysis
#'
#' Combines enrollment and grade data, adds SES and APCG indicators,
#' creates missing-value indicators, and imputes missing numerics.
#'
#' @param metadata Named list from [read_metadata()].
#' @param opleidingsnaam Character. Name of the program.
#' @param opleidingsvorm Character. Program form ("VT", "DT", or "DU").
#'
#' @return A data frame with transformed and imputed data.
#'
#' @importFrom dplyr mutate across select
#' @export
transform_data <- function(metadata, opleidingsnaam, opleidingsvorm) {
- First line: short description (one sentence)
- Body: what the function does, in more detail
@param: every parameter documented@return: what comes back@importFrom: explicit namespace imports@export: for user-facing functions
Comments in scripts
# Good: explains WHY
# Filter to first-year students only — retention is only meaningful for year 1
df <- df |>
filter(INS_Studiejaar == 1)
# Bad: explains WHAT (obvious from the code)
# Filter the dataframe
df <- df |>
filter(INS_Studiejaar == 1)
- Comments are directive and explain the why
- Use
##(double hash) for comments, not# - Avoid commented-out code — delete it (git remembers)
- Use
## TODO:for temporary code that needs attention
Error Handling
# Good: cli for messages, rlang for errors
cli::cli_alert_info("Processing {.val {nrow(df)}} students")
if (nrow(df) == 0) rlang::abort("No students found for {opleidingsnaam}")
# Bad: base R
message(paste("Processing", nrow(df), "students"))
if (nrow(df) == 0) stop("No students found")
- Use
clipackage for informational messages - Use
rlang::abort()/rlang::warn()/rlang::inform()for conditions - Guard clauses at function start for input validation
Testing
test_that("transform_data returns expected columns", {
result <- transform_data(test_metadata, "Test", "VT")
expect_s3_class(result, "data.frame")
expect_true("retentie" %in% names(result))
})
- Use testthat (>= 3.0)
- Test file mirrors source file:
R/transform_data.R→tests/testthat/test-transform_data.R - Test expected outputs, edge cases, and error conditions
- Include small test fixtures in
tests/testthat/fixtures/
Formatting
Use air for automatic code formatting. Air is R's equivalent of ruff (Python) — an opinionated formatter that enforces consistent style.
# Format all R files
air format .
# Check without modifying
air format --check .
Air handles indentation, spacing, line breaks, and brace placement. Focus on writing clear code and let air handle the formatting.
Dependency Management
- Use
renvfor reproducible environments - After adding packages:
renv::snapshot() - On clone:
renv::restore() - Keep
renv.lockin version control - Don't commit the
renv/library directory