The Random Side of R: A Mixed Model Adventure
This exemplar was developed at Imperial College London by Valentina Quintero Santofimio in collaboration with Saranjeet Kaur Bhogal and Miruna Serian from Research Software Engineering and John Pinney from Research Computing & Data Science at the Early Career Researcher Institute.
Brief description
Mixed models, also known multilevel/hierarchical models, are a type of statistical model that include fixed effects and/or random effects. Traditional linear models assume all observations are independent, but thatβs rarely true. For example, patients from the same hospital, students from the same classroom, or multiple measurements from the same individual, have a specific structure that if ignored, it can lead to biased estimates, underestimated variability, and invalid conclusions. In this project, I plan to offer a practical introduction to mixed effects models in R, focusing on their use for analysing data with grouped/hierarchical structures. I will cover when and why to use mixed models, explain the difference between fixed and random effects, and walk through examples using specific R packages
Learning Outcomes π
After completing this exemplar, students will:
- Understand the foundation of linear mixed-effects models (LMMs)
- Know when and why to use mixed models over simple linear regression
- Implement mixed models in R
- Interpret fixed and random effects, variance components, and model fit statistics
- Visualize and validate model
- Extend to generalized mixed models (GLMMs) and crossed/nested designs
Target Audience π―
The primary focus is to work with large (usually epidemiological) data. Mixed models are useful when working with repeated measures, nested data, or simply when we want to account for variability across groups.
These models however, can be applicable to any large dataset (any discipline), that has sufficient grouping or hierarchical structure, ensuring accurate interpretation and conclusions.
Requirements
Academic π
It would be very useful to take some courses offered by the Graduate School at Imperial College London, either as an introduction or a refresher:
Research Computing & Data Science Skills Courses
Software Tools π οΈ
In this project we will be using R and RStudio:
R (the statistical programming language): 4.5.1. RStudio (the IDE/GUI for R): 2025.09.2
We will also use the following R packages: - lme4 for fitting the model - lmerTest for advanced models - performance, see, and parameters for diagnostics, fit checks, and parameter extraction
Getting Started π
- Start by reading the ReCoDE main page.
- Complete the
Introductionsection (video lecture, reading materials) - Continue with
Data Curation(get your data ready for a mixed model analysis) - Conduct a
Mixed model analysis I - Take your analysis to the next level by attempting the extension task
Mixed model analysis II(Advanced exercise)
Roadmap πΊοΈ
Core π§©
Extensions π
Data π
For this exercise, a synthetic data set will be generated resembling large epidemiological data with different sites (clusters).
Estimated Time β³
| Task | Time |
|---|---|
| Pre-session material | 3 hours |
| Data curation | 2 hours |
| Analysis | 2-3 hours |
| Interpretation of results | 2 hours |
Licence π
This project is licensed under the BSD-3-Clause license.