Extension Activity: Mixed Models with Student Data

Overview

In this activity, you will analyse a synthetic dataset of students nested within classrooms. Your tasks:

  1. Load the dataset
  2. Explore and clean variables
  3. Fit fixed and mixed models
  4. Visualise effects
  5. Produce a results table

Each task includes hints and hidden solutions.


Dataset description

The dataset contains: - class: classroom identifier (random intercept) - student_id: unique ID
- hours_study: average weekly study hours
- SES: socioeconomic status (Low/Medium/High)
- gender: Male/Female
- score: exam score (outcome)

A classroom-level effect influences baseline scores, making the data suitable for mixed modelling.


1. Load the packages

# HINT: dplyr, ggplot2, lme4, broom.mixed, gt
Show solution
library(dplyr)
library(ggplot2)
library(lme4)
library(broom.mixed)
library(gt)

2. Import the dataset

Download the dataset here:

👉 Download practical.csv

# HINT:
# You can use read.csv()
# Remember to specify the folder in which the dataset is saved e.g., "data"
# Give a name to your data e.g., df
Show solution
df <- read.csv("data/practical.csv")

3. Explore the data

# HINTS:
# Have a look at the first few rows across all columns 
# Also try to use summary() and table() functions
Show solution
head(df)
summary(df)
table(df$class)

4. Clean and prepare variables

# HINT:
# Convert SES and gender to factors
# Ensure class is a factor
Show solution
df$SES <- factor(df$SES, levels=c("Low","Medium","High"))
df$gender <- factor(df$gender)
df$class <- factor(df$class)

5. Fit a Fixed-Effects Model

Outcome: score

# HINT:
# mod_fixed <- lm(OUTCOME ~ EXPOSURE + COVARIATES, data=df)
Show solution
mod_fixed <- lm(score ~ hours_study + SES + gender, data=df)
summary(mod_fixed)

6. Fit a Mixed-Effects Model

(Random Intercept: Class)

# HINT:
# mod_mixed <- lmer(OUTCOME ~ EXPOSURE + COVARIATES, + (1 | RANDOM INTERCEPT), data=df)
Show solution
mod_mixed <- lmer(score ~ hours_study + SES + gender + (1 | class), data=df)
summary(mod_mixed)

7. Visualise fixed effects

# HINT: try to use sjPlot R Package
Show solution
sjPlot::plot_model(mod_mixed, type="est")

8. Visualise random effects

# HINT: you can use plot_model()
Show solution
plot_model(mod_mixed, type="re")

9. Create a results table

# HINT: you can use tidy() and gt() functions here
Show solution
tidy(mod_mixed, effects="fixed") %>% gt()

Reflection

When comparing the fixed-effects and mixed-effects models, consider the questions below:

  1. Effect Sizes
    Do the estimated effects of study hours, SES, or gender change when adding the classroom random intercept?

  2. Uncertainty
    Do standard errors increase or decrease? Does this affect which predictors appear important?

  3. Clustering Is there evidence of meaningful variation between classes?

  4. Interpretation Why might the mixed model provide a more realistic representation of student exam scores?


Show example answers

1. Effect Sizes The coefficients for study hours, SES, and gender remain similar in size, but may shift slightly once classroom effects are accounted for. This indicates that most of the predictor effects are robust, though the fixed model may over- or underestimate some associations.

2. Uncertainty
Standard errors typically increase in the mixed model because it correctly acknowledges that observations within the same class are not independent. As a result, some predictors may become less statistically significant.

3. Clustering The random intercept variance shows that classes differ systematically in their baseline exam scores. This justifies the use of a mixed model.

4. Interpretation The mixed model is preferable because it adjusts for classroom-level differences that could confound the relationships between student characteristics and exam scores. Ignoring this structure risks producing overly confident or biased estimates.