CEDA R Style Guide

R coding standards for cedanl repositories. Based on the Tidyverse style guide with CEDA-specific conventions.

Ecosystem

Tidyverse dialect

Write R in the tidyverse dialect. Tidyverse packages share a consistent design philosophy for data manipulation, which makes code predictable and composable.

Use |> (base pipe) for pipelines
Prefer tidyverse functions over base R equivalents when they improve readability (e.g., str_detect() over grepl(), read_csv() over read.csv())
Use tidymodels for modeling workflows
Use ggplot2 for visualization

Preferred packages

Domain	Package	Notes
Data manipulation	dplyr, tidyr	Core pipeline
String operations	stringr	Consistent `str_` prefix
Date/time	lubridate	Readable date manipulation
Reading data	readr, arrow	`read_csv()`, `read_parquet()`
Modeling	tidymodels (parsnip, recipes, workflows, yardstick)	Unified modeling interface
Plotting	ggplot2	With ragg for rendering
Tables	flextable, gtsummary	For reports
Reports	quarto	`.qmd` templates
Interactive	shiny	App interface
CLI messages	cli	For user-facing messages
Error handling	rlang	`rlang::abort()` over `stop()`
File paths	fs	Cross-platform path handling
Iteration	purrr	Functional iteration (`map()`, `walk()`)
Data cleaning	janitor	`clean_names()`, `tabyl()`

See Principles §11 for general dependency selection criteria.

Package Structure

Every R repo is an R package. See Project Structure for full directory layouts per repo type.

Function files

One primary function per file, with helpers below it
File name matches the main function: transform_data.R contains transform_data()
User-facing (exported) functions at the top of the file
Helper functions below, ordered hierarchically

main.R

The entry point that orchestrates the pipeline:

## Load the package
devtools::load_all()

## Configuration
opleidingsnaam <- "Informatica"
opleidingsvorm <- "VT"

## Run pipeline
metadata <- read_metadata()
programs <- transform_data(metadata, opleidingsnaam, opleidingsvorm)
results <- run_analysis(programs)
render_report(results)

main.R is NOT part of the package — it's a script that uses the package.

Shiny app (interactive mode)

The Shiny app lives inside the package at inst/app/. This follows the standard CRAN pattern used by packages like esquisse, radiant, and DALEX.

# R/run_app.R
#' Launch the interactive application
#'
#' @param ... Arguments passed to [shiny::runApp()].
#' @export
run_app <- function(...) {
  if (!requireNamespace("shiny", quietly = TRUE)) {
    rlang::abort("Package {.pkg shiny} needed. Install: install.packages('shiny')")
  }
  app_dir <- system.file("app", package = "nfwa")
  shiny::runApp(app_dir, ...)
}

Key conventions: - App code in inst/app/app.R — wraps package functions with a UI - Launch function in R/run_app.R — uses system.file() to locate the app - shiny in Suggests: (not Imports:) — package works without Shiny installed - config.yml in inst/app/ for local data paths - App contains NO business logic — only UI and calls to package functions

Syntax

Assignment and pipes

# Good
students <- students_raw |>
  filter(INS_Studiejaar == 2024) |>
  select(INS_Studentnummer, INS_Opleidingsnaam)

# Bad
students = students_raw %>%
  filter(INS_Studiejaar == 2024) %>%
  select(INS_Studentnummer, INS_Opleidingsnaam)

Use <- for assignment, not =
Use |> (base pipe), not %>% (magrittr pipe)
Left-hand assignment only (x <- value, not value -> x)
Put the source object and first pipe on the same line

Spacing

# Good
height <- (feet * 12) + inches
df$z
x <- 1:10

# Bad
height<-feet*12+inches
df $ z
x <- 1 : 10

Spaces around binary operators: ==, <-, +, -, *, /, |>
No spaces around unary operators: :, ::, $, @, [, [[
Extra alignment spaces are fine for readability with <-

Braces and code blocks

# Good
if (debug) {
  show(x)
}

# Bad
if(debug){
  show(x)
}

{ at end of line, after a space
} on its own line
Space before ( in control flow (if (, for (), but not in function calls (mean(x))

Function definitions

# Good: explicit return, named arguments
calculate_retention <- function(df,
                                year = 2024,
                                min_credits = 0,
                                include_masters = FALSE) {
  df <- df |>
    filter(INS_Studiejaar == year)

  return(df)
}

# Bad: implicit return, no named arguments
f <- function(d, y, m, i) {
  d |> filter(INS_Studiejaar == y)
}

Use explicit return() statements
Name arguments explicitly when calling functions with more than 2 arguments
Put each argument on its own line when the signature is long
Sensible defaults for optional arguments

Line length and structure

Maximum line length: 100 characters
Avoid blank lines within a pipe chain
One blank line between code blocks
Two blank lines before a new section

Naming

Functions in snake_case, English
Start with a verb: transform_data(), create_plot(), add_ses(), get_fairness_conclusions()
Exception: Shiny modules use camelCase (Shiny convention)
Variable names descriptive, immediately understandable, snake case
Column names: see Data Conventions (preserve source names)

Documentation

Roxygen2 for all exported functions

#' Transform raw enrollment data for analysis
#'
#' Combines enrollment and grade data, adds SES and APCG indicators,
#' creates missing-value indicators, and imputes missing numerics.
#'
#' @param metadata Named list from [read_metadata()].
#' @param opleidingsnaam Character. Name of the program.
#' @param opleidingsvorm Character. Program form ("VT", "DT", or "DU").
#'
#' @return A data frame with transformed and imputed data.
#'
#' @importFrom dplyr mutate across select
#' @export
transform_data <- function(metadata, opleidingsnaam, opleidingsvorm) {

First line: short description (one sentence)
Body: what the function does, in more detail
@param: every parameter documented
@return: what comes back
@importFrom: explicit namespace imports
@export: for user-facing functions

Comments in scripts

# Good: explains WHY
# Filter to first-year students only — retention is only meaningful for year 1
df <- df |>
  filter(INS_Studiejaar == 1)

# Bad: explains WHAT (obvious from the code)
# Filter the dataframe
df <- df |>
  filter(INS_Studiejaar == 1)

Comments are directive and explain the why
Use ## (double hash) for comments, not #
Avoid commented-out code — delete it (git remembers)
Use ## TODO: for temporary code that needs attention

Error Handling

# Good: cli for messages, rlang for errors
cli::cli_alert_info("Processing {.val {nrow(df)}} students")
if (nrow(df) == 0) rlang::abort("No students found for {opleidingsnaam}")

# Bad: base R
message(paste("Processing", nrow(df), "students"))
if (nrow(df) == 0) stop("No students found")

Use cli package for informational messages
Use rlang::abort() / rlang::warn() / rlang::inform() for conditions
Guard clauses at function start for input validation

Testing

test_that("transform_data returns expected columns", {
  result <- transform_data(test_metadata, "Test", "VT")
  expect_s3_class(result, "data.frame")
  expect_true("retentie" %in% names(result))
})

Use testthat (>= 3.0)
Test file mirrors source file: R/transform_data.R → tests/testthat/test-transform_data.R
Test expected outputs, edge cases, and error conditions
Include small test fixtures in tests/testthat/fixtures/

Formatting

Use air for automatic code formatting. Air is R's equivalent of ruff (Python) — an opinionated formatter that enforces consistent style.

# Format all R files
air format .

# Check without modifying
air format --check .

Air handles indentation, spacing, line breaks, and brace placement. Focus on writing clear code and let air handle the formatting.

Dependency Management

Use renv for reproducible environments
After adding packages: renv::snapshot()
On clone: renv::restore()
Keep renv.lock in version control
Don't commit the renv/ library directory