The Random Side of R: A Mixed Model Adventure

This exemplar was developed at Imperial College London by Valentina Quintero Santofimio in collaboration with Saranjeet Kaur Bhogal and Miruna Serian from Research Software Engineering and John Pinney from Research Computing & Data Science at the Early Career Researcher Institute.

Brief description

Mixed models, also known multilevel/hierarchical models, are a type of statistical model that include fixed effects and/or random effects. Traditional linear models assume all observations are independent, but that’s rarely true. For example, patients from the same hospital, students from the same classroom, or multiple measurements from the same individual, have a specific structure that if ignored, it can lead to biased estimates, underestimated variability, and invalid conclusions. In this project, I plan to offer a practical introduction to mixed effects models in R, focusing on their use for analysing data with grouped/hierarchical structures. I will cover when and why to use mixed models, explain the difference between fixed and random effects, and walk through examples using specific R packages

Learning Outcomes πŸŽ“

After completing this exemplar, students will:

  • Understand the foundation of linear mixed-effects models (LMMs)
  • Know when and why to use mixed models over simple linear regression
  • Implement mixed models in R
  • Interpret fixed and random effects, variance components, and model fit statistics
  • Visualize and validate model
  • Extend to generalized mixed models (GLMMs) and crossed/nested designs

Target Audience 🎯

The primary focus is to work with large (usually epidemiological) data. Mixed models are useful when working with repeated measures, nested data, or simply when we want to account for variability across groups.

These models however, can be applicable to any large dataset (any discipline), that has sufficient grouping or hierarchical structure, ensuring accurate interpretation and conclusions.

Requirements

Academic πŸ“š

It would be very useful to take some courses offered by the Graduate School at Imperial College London, either as an introduction or a refresher:

Research Computing & Data Science Skills Courses

Software Tools πŸ› οΈ

In this project we will be using R and RStudio:

R (the statistical programming language): 4.5.1. RStudio (the IDE/GUI for R): 2025.09.2

We will also use the following R packages: - lme4 for fitting the model - lmerTest for advanced models - performance, see, and parameters for diagnostics, fit checks, and parameter extraction

Getting Started πŸš€

  • Start by reading the ReCoDE main page.
  • Complete the Introduction section (video lecture, reading materials)
  • Continue with Data Curation(get your data ready for a mixed model analysis)
  • Conduct a Mixed model analysis I
  • Take your analysis to the next level by attempting the extension task Mixed model analysis II (Advanced exercise)

Roadmap πŸ—ΊοΈ

Core 🧩

Extensions πŸ”Œ

Data πŸ“Š

For this exercise, a synthetic data set will be generated resembling large epidemiological data with different sites (clusters).

Estimated Time ⏳

Task Time
Pre-session material 3 hours
Data curation 2 hours
Analysis 2-3 hours
Interpretation of results 2 hours

Licence πŸ“„

This project is licensed under the BSD-3-Clause license.