Extension Activity: Mixed Models with Student Data

Overview

In this activity, you will analyse a synthetic dataset of students nested within classrooms. Your tasks:

Load the dataset
Explore and clean variables
Fit fixed and mixed models
Visualise effects
Produce a results table

Each task includes hints and hidden solutions.

Dataset description

The dataset contains: - class: classroom identifier (random intercept) - student_id: unique ID
- hours_study: average weekly study hours
- SES: socioeconomic status (Low/Medium/High)
- gender: Male/Female
- score: exam score (outcome)

A classroom-level effect influences baseline scores, making the data suitable for mixed modelling.

1. Load the packages

# HINT: dplyr, ggplot2, lme4, broom.mixed, gt

Show solution

library(dplyr)
library(ggplot2)
library(lme4)
library(broom.mixed)
library(gt)

2. Import the dataset

Download the dataset here:

👉 Download practical.csv

# HINT:
# You can use read.csv()
# Remember to specify the folder in which the dataset is saved e.g., "data"
# Give a name to your data e.g., df

Show solution

df <- read.csv("data/practical.csv")

3. Explore the data

# HINTS:
# Have a look at the first few rows across all columns 
# Also try to use summary() and table() functions

Show solution

head(df)
summary(df)
table(df$class)

4. Clean and prepare variables

# HINT:
# Convert SES and gender to factors
# Ensure class is a factor

Show solution

df$SES <- factor(df$SES, levels=c("Low","Medium","High"))
df$gender <- factor(df$gender)
df$class <- factor(df$class)

5. Fit a Fixed-Effects Model

Outcome: score

# HINT:
# mod_fixed <- lm(OUTCOME ~ EXPOSURE + COVARIATES, data=df)

Show solution

mod_fixed <- lm(score ~ hours_study + SES + gender, data=df)
summary(mod_fixed)

6. Fit a Mixed-Effects Model

(Random Intercept: Class)

# HINT:
# mod_mixed <- lmer(OUTCOME ~ EXPOSURE + COVARIATES, + (1 | RANDOM INTERCEPT), data=df)

Show solution

mod_mixed <- lmer(score ~ hours_study + SES + gender + (1 | class), data=df)
summary(mod_mixed)

7. Visualise fixed effects

# HINT: try to use sjPlot R Package

Show solution

sjPlot::plot_model(mod_mixed, type="est")

8. Visualise random effects

# HINT: you can use plot_model()

Show solution

plot_model(mod_mixed, type="re")

9. Create a results table

# HINT: you can use tidy() and gt() functions here

Show solution

tidy(mod_mixed, effects="fixed") %>% gt()

Reflection

When comparing the fixed-effects and mixed-effects models, consider the questions below:

Effect Sizes
Do the estimated effects of study hours, SES, or gender change when adding the classroom random intercept?
Uncertainty
Do standard errors increase or decrease? Does this affect which predictors appear important?
Clustering Is there evidence of meaningful variation between classes?
Interpretation Why might the mixed model provide a more realistic representation of student exam scores?

Show example answers

1. Effect Sizes The coefficients for study hours, SES, and gender remain similar in size, but may shift slightly once classroom effects are accounted for. This indicates that most of the predictor effects are robust, though the fixed model may over- or underestimate some associations.

2. Uncertainty
Standard errors typically increase in the mixed model because it correctly acknowledges that observations within the same class are not independent. As a result, some predictors may become less statistically significant.

3. Clustering The random intercept variance shows that classes differ systematically in their baseline exam scores. This justifies the use of a mixed model.

4. Interpretation The mixed model is preferable because it adjusts for classroom-level differences that could confound the relationships between student characteristics and exam scores. Ignoring this structure risks producing overly confident or biased estimates.