Multilevel Models Tutorial

Erin M. Buchanan

Goals

Understand what types of data are appropriate for multilevel models
Review the basic equations/terminology of regression versus multilevel models
Learn how to implement these models in R

Terminology Note

Multi-level model
Random effects model
Mixed model
Random coefficient model
Hierarchical model

Non-Hierarchical Data

Often thought of as “between-subjects” data
Each person/data observation has one row of data
Where each column represents one variable

Hierarchical Data

Hierarchical data is data that contains some form of structure of data points “nested” within a variable
- A group of student’s exam scores (each person tested multiple times)
- Prices of healthcare equipment over the last year (same equipment measured over time)
- Happiness scores for individuals in multiple countries (each country contains multiple measurements of people)

Hierarchical Data

In theory, we could create one average score for each of the multiple measurements:
- A group of student’s exam scores (each person tested multiple times)
  - Each student has one overall exam average
  - Each exam has an overall average
- Prices of healthcare equipment over the last year (same equipment measured over time)
  - Average yearly price for each equipment piece
- Happiness scores for individuals in multiple countries (each country contains multiple measurements of people)
  - Average happiness by country

Hierarchical Data

However, that’s lame:
- Reduces power because multiple data points become one score
- Ignores the measurement error of the multiple data points
- May hide interesting relationships that exist when points are examined individually

Multilevel Models

Multilevel models (MLMs) allow you to retain power and analyze each data point while controlling for non-independence.
You can potentially uncover why heteroscedasticity occurs within the analysis
You can have missing data but still use other available data points
- For example, if equipment prices are missing from May, you could model the data even without those numbers
Control for measurement error due to the study design or measurement tools

Regression Equation

A simple equation with one predictor:

\[\hat{Y_i} = b_0 + b_1X_i + \epsilon_i\]

\(\hat{Y_i}\): The predicted dependent variable score for participant \(i\)
\(b_0\): The y-intercept, average score for Y when X is zero
\(b_1\): The slope for the first predictor
- For every one unit increase in X, we see \(b\) increases in Y
\(X_i\): The score on the first predictor for participant \(i\)
\(\epsilon_i\): The residual or error for participant \(i\)
- The actual \(Y_i\) minus the predicted \(\hat{Y_i}\)

Modeling

MLM is still regression - and we can use linear, logistic, etc.
We generally use the same system as examining regression:
- An overall model
- Examine individual predictors
However, we need to add a new concept: random effects
- Fixed coefficients: Intercepts/slopes are assumed to be the same across different contexts
- Random coefficients: Intercepts/slopes are allowed to vary across different contexts

Random Coefficients

Random variable: a grouping or clustering variable
- You set this variable up based on the structure of the data
- Usually the participants or data points that are measured multiple times
A group of student’s exam scores (each person tested multiple times)
- Students would be the random variable
Prices of healthcare equipment over the last year (same equipment measured over time)
- Equipment is most likely, but could also be time depending on hypothesis
Happiness scores for individuals in multiple countries (each country contains multiple measurements of people)
- Country

MLM Terminology

Level 1 versus Level 2, 3, …
Examples help: Predicting a language test score ~ gender_student + school_type
Dependent variable: language test score
Independent variables: gender of student and school type
Cluster variable: school number (multiple schools within school type)
Level 1: gender of student (each test score is matched to a gender)
Level 2: school type (multiple language scores matched to a school type)

MLM Regression Equation

Level 1

\[\hat{Y_{ij}} = b_{0j} + b_{1j}X_{ij} + \epsilon_{ij}\]

\(\hat{Y_{ij}}\): predicted score for the dependent variable for participant \(i\) for Level 2 \(j\)
\(b_{0j}\): Intercept
\(b_{1j}\): Slope for level 1 predictor
\(X_{ij}\): Level 1 predictor score for participant \(i\) who is in Level 2 \(j\)
\(\epsilon_{ij}\): Error for participant \(i\) who is in Level 2 \(j\)

MLM Regression Equation

Level 2

\[b_{0j} = \gamma_{00} + \gamma_{01}X_j + u_{0j}\]

\(X_j\): Level 2 predictor
\(\gamma_{00}\): Overall intercept: grand mean of the dependent variable
\(\gamma_{01}\): Effect of Level 2 predictor on level 1 intercept
\(u_{0j}\): Difference in Level 2 \(j\) from the overall intercept

\[b_{1j} = \gamma_{10} + \gamma_{11}X_j + u_{1j}\]

\(\gamma_{10}\): average slope of the Level 1 predictor
\(\gamma_{11}\): Effect of the Level 2 predictor on the level 1 slope
\(u_{1j}\): Difference in Level 2 \(j\) from the overall slope

Random Coefficients

Random intercept: allows each of the random cluster variables to have a different intercept
- Remember that the intercept is the average score for Y if X(s) are zero
- Therefore, effectively this is the average of each cluster separately
- The values shown in the output are the deviation from the overall non-random intercept

Random Coefficients

knitr::include_graphics("images/12_img_1.png")

Random Coefficients

Random slope: allows each of the random cluster variables to have a different slope for the target coefficient
- You can just do one random slope (for just one X variable) or many of them
- This output creates a different slope for each of the clusters

Random Coefficients

knitr::include_graphics("images/12_img_2.png")

Random Coefficients

knitr::include_graphics("images/12_img_3.png")

What Should We Use?

Models are generally built from the simplest to the most complex
- Null model or intercept only model
- Random intercept model + below
- Fixed effects model + below
- Random slopes + below (if you want)
But sometimes people like to do it the other way
Compare with AIC

Packages

library(rio) # importing data
library(nlme) # other people love lmer4
library(dplyr) # best data manipulation package
library(performance) # easy stats is great
library(parameters) # easy stats is great 
library(report) # easy stats is great 
# also recommend library(see)
library(MuMIn) # effect sizes

Example 1: Priming

A good number of psychology projects use repeated measures data
This study is a large scale semantic priming study: https://osf.io/preprints/osf/q4fjy
The data includes the priming trials from Serbian
Dependent variable: Response latency
Independent variable: Word pair condition (Level 1)
Clustering variables: Participant, Item Trial

Example 1: Priming Data

DF <- import("data/sr_prime_trials.csv") %>% 
  # only variables we need
  select(observation, target_duration, word_combo, 
         keep_target, keep_participant, type) %>% 
  # remove trials and people who don't meet inclusion rules
  filter(keep_target == "keep") %>% 
  filter(keep_participant == "keep") %>% 
  na.omit()

head(DF)

     observation target_duration   word_combo keep_target keep_participant
1 5c86ebfb2-e609         575.544 авенијаручак        keep             keep
2 be68c9e68-9040         662.900 авенијаручак        keep             keep
3 c70a8bb1f-d793         488.900 авенијаручак        keep             keep
4 77c9da84f-a097         400.000 авенијаручак        keep             keep
5 c4244b8de-0681         609.000 авенијаручак        keep             keep
6 e9b8b9c54-a976         576.962 авенијаручак        keep             keep
       type
1 unrelated
2 unrelated
3 unrelated
4 unrelated
5 unrelated
6 unrelated

Example 1: Priming Data

# number of participants
length(unique(DF$observation))

[1] 681

# number of items 
length(unique(DF$word_combo))

[1] 2000

# average number of items per participant
DF %>% 
  group_by(observation) %>% 
  summarize(n_trials = n()) %>% 
  ungroup() %>% 
  summarize(mean_trial = mean(n_trials),
            sd_trial = sd(n_trials))

# A tibble: 1 × 2
  mean_trial sd_trial
       <dbl>    <dbl>
1       140.     21.5

Example 1: Intercept Only Model

gls is akin to lm using generalized least squares

intercept.model <- gls(target_duration ~ 1, 
                       data = DF, 
                       method = "ML", 
                       na.action = "na.omit")

summary(intercept.model)

Generalized least squares fit by maximum likelihood
  Model: target_duration ~ 1 
  Data: DF 
      AIC     BIC    logLik
  1362406 1362425 -681201.1

Coefficients:
               Value Std.Error t-value p-value
(Intercept) 725.0498 0.9731638 745.044       0

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.8659624 -0.6086932 -0.2800019  0.2603637  7.4749990 

Residual standard error: 300.8902 
Degrees of freedom: 95598 total; 95597 residual

Example 1: Random Intercept Model

nlme::lme versus lme4::lmer
- Random is added directly to the equation
- target_duration ~ (1|observation) + (1|word_combo)

random.intercept.model <- lme(target_duration ~ 1, 
                              data = DF, 
                              method = "ML", 
                              na.action = "na.omit",
                              # this is the random intercept part 
                              random = list(~1|observation,
                                            ~1|word_combo))

summary(random.intercept.model)

Linear mixed-effects model fit by maximum likelihood
  Data: DF 
      AIC     BIC    logLik
  1343359 1343397 -671675.5

Random effects:
 Formula: ~1 | observation
        (Intercept)
StdDev:    135.5622

 Formula: ~1 | word_combo %in% observation
        (Intercept) Residual
StdDev:    141.7564 228.5829

Fixed effects:  target_duration ~ 1 
               Value Std.Error    DF  t-value p-value
(Intercept) 725.6796  5.273002 94343 137.6217       0

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.3912147 -0.4694673 -0.1845673  0.2214105  7.1559565 

Number of Observations: 95598
Number of Groups: 
                observation word_combo %in% observation 
                        681                       95024

Example 1: Fixed Effects Model

fixed.model <- lme(target_duration ~ 1 + type,
                   data = DF, 
                   method = "ML", 
                   na.action = "na.omit",
                   random = list(~1|observation,
                                 ~1|word_combo))

summary(fixed.model)

Linear mixed-effects model fit by maximum likelihood
  Data: DF 
      AIC     BIC    logLik
  1342544 1342591 -671266.9

Random effects:
 Formula: ~1 | observation
        (Intercept)
StdDev:    135.6001

 Formula: ~1 | word_combo %in% observation
        (Intercept) Residual
StdDev:     139.747 228.4602

Fixed effects:  target_duration ~ 1 + type 
                 Value Std.Error    DF   t-value p-value
(Intercept)   700.9499  5.343982 94342 131.16623       0
typeunrelated  49.7186  1.735545 94342  28.64723       0
 Correlation: 
              (Intr)
typeunrelated -0.162

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.3414581 -0.4729406 -0.1829942  0.2230444  7.2947095 

Number of Observations: 95598
Number of Groups: 
                observation word_combo %in% observation 
                        681                       95024

Example 1: Model Comparison

compare_performance(intercept.model, random.intercept.model,
                    fixed.model)

# Comparison of Model Performance Indices

Name                   | Model |   AIC (weights) |   BIC (weights) |    RMSE |   Sigma |  AICc (weights) |        R2
--------------------------------------------------------------------------------------------------------------------
intercept.model        |   gls | 1.4e+06 (<.001) | 1.4e+06 (<.001) | 300.890 | 300.890 |         (>.999) | 4.441e-16
random.intercept.model |   lme | 1.3e+06 (<.001) | 1.3e+06 (<.001) | 193.686 | 228.583 | 1.3e+06 (>.999) |          
fixed.model            |   lme | 1.3e+06 (>.999) | 1.3e+06 (>.999) | 194.311 | 228.460 | 1.3e+06 (>.999) |

AIC(intercept.model)

[1] 1362406

AIC(random.intercept.model) # lower

[1] 1343359

AIC(fixed.model) # lower

[1] 1342544

Example 1: Model Assumption Checks

# check_model(fixed.model)
# check_outliers(fixed.model)

# if data is large, check_model is slow 
residuals <- scale(residuals(fixed.model))
fitted <- scale(fitted.values(fixed.model))

hist(residuals)

{ qqnorm(residuals); abline(0,1) }

{ plot(residuals, fitted); abline(h = 0); abline(v = 0) }

Example 1: Effect Size

r.squaredGLMM(fixed.model)

             R2m       R2c
[1,] 0.006811159 0.4247235

Example 1: Examine Results

model_parameters(fixed.model, summary = TRUE)

# Fixed Effects

Parameter        | Coefficient |   SE |           95% CI | t(94342) |      p
----------------------------------------------------------------------------
(Intercept)      |      700.95 | 5.34 | [690.48, 711.42] |   131.17 | < .001
type [unrelated] |       49.72 | 1.74 | [ 46.32,  53.12] |    28.65 | < .001

# Random Effects

Parameter                   | Coefficient
-----------------------------------------
SD (Intercept: observation) |      135.60
SD (Intercept: word_combo)  |      139.75
SD (Residual)               |      228.46

Model: target_duration ~ 1 + type (95598 Observations)
Residual standard deviation: 228.460 (df = 94342)
:

# plot(model_parameters(fixed.model, summary = TRUE))
standardize_parameters(fixed.model)

# Standardization method: refit

Parameter        | Std. Coef. |         95% CI
----------------------------------------------
(Intercept)      |      -0.08 | [-0.11, -0.05]
type [unrelated] |       0.17 | [ 0.15,  0.18]

report(fixed.model)

We fitted a linear mixed model (estimated using ML and nlminb optimizer) to
predict target_duration with type (formula: target_duration ~ 1 + type). The
model included observation as random effects (formula: list(~1 | observation,
~1 | word_combo)). The model's intercept, corresponding to type = related, is
at 700.95 (95% CI [690.48, 711.42], t(94342) = 131.17, p < .001). Within this
model:

  - The effect of type [unrelated] is statistically significant and positive
(beta = 49.72, 95% CI [46.32, 53.12], t(94342) = 28.65, p < .001; Std. beta =
0.17, 95% CI [0.15, 0.18])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

Example 2: Daily Diary Study

Participants were given daily surveys to to measure their negative affect and stress across time
Dependent variable: Stress scores
Independent variable:
- Level 1: Centered PANAS Negative Score (within)
- Level 2: Mean PANAS Score across time (between)
Clustering variable(s): Participant, time

Example 2: Daily Diary Data

DF <- import("data/panas_data.csv") %>% 
  na.omit()

head(DF)

  progress_total panas_negative stress_total                           partno
1             23             15            1 dc990a2ce52ba0b2d2d766e43c5071d6
2             23             15            1 dc990a2ce52ba0b2d2d766e43c5071d6
3             25             14            1 dc990a2ce52ba0b2d2d766e43c5071d6
4             26             14            1 dc990a2ce52ba0b2d2d766e43c5071d6
5             25             16            1 dc990a2ce52ba0b2d2d766e43c5071d6
6             30             15            1 dc990a2ce52ba0b2d2d766e43c5071d6
  realtime
1     0.00
2     1.00
3     1.99
4     2.99
5     3.98
6     4.99

Example 2: Daily Diary Data

# number of participants
length(unique(DF$partno))

[1] 113

# average number of time points per participant
DF %>% 
  group_by(partno) %>% 
  summarize(n_times = n()) %>% 
  ungroup() %>% 
  summarize(mean_times = mean(n_times),
            sd_times = sd(n_times),
            min_times = min(n_times),
            max_times = max(n_times))

# A tibble: 1 × 4
  mean_times sd_times min_times max_times
       <dbl>    <dbl>     <int>     <int>
1       12.6     2.64         1        15

Example 2: Intercept Only Model

intercept.model <- gls(stress_total ~ 1, 
                       data = DF, 
                       method = "ML", 
                       na.action = "na.omit")

summary(intercept.model)

Generalized least squares fit by maximum likelihood
  Model: stress_total ~ 1 
  Data: DF 
       AIC      BIC    logLik
  4876.129 4886.646 -2436.065

Coefficients:
               Value  Std.Error  t-value p-value
(Intercept) 1.528169 0.03571252 42.79085       0

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.1359513 -0.3926099 -0.3926099 -0.3926099  7.7841455 

Residual standard error: 1.345277 
Degrees of freedom: 1420 total; 1419 residual

Example 2: Random Intercept Model

random.intercept.model <- lme(stress_total ~ 1, 
                              data = DF, 
                              method = "ML", 
                              na.action = "na.omit",
                              # this is the random intercept part 
                              random = list(~1|partno))

summary(random.intercept.model)

Linear mixed-effects model fit by maximum likelihood
  Data: DF 
       AIC      BIC    logLik
  4456.026 4471.802 -2225.013

Random effects:
 Formula: ~1 | partno
        (Intercept) Residual
StdDev:   0.8244337  1.06616

Fixed effects:  stress_total ~ 1 
              Value  Std.Error   DF  t-value p-value
(Intercept) 1.55362 0.08321221 1307 18.67058       0

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-4.30454994 -0.29479819 -0.11525706 -0.05541002  8.60744658 

Number of Observations: 1420
Number of Groups: 113

Example 2: Fixed Effects Creation

negative.mean <- aggregate(DF$panas_negative, 
                        by = list(DF$partno), 
                        mean, na.rm = TRUE)
names(negative.mean) <- c("partno", "negative.mean")
DF <- merge(DF, negative.mean, by = "partno")
DF$negative.cnt <- DF$panas_negative - DF$negative.mean

Example 2: Fixed Effects Model

fixed.model <- lme(stress_total ~ 1 + negative.mean + negative.cnt,
                   data = DF, 
                   method = "ML", 
                   na.action = "na.omit",
                   random = list(~1|partno))

summary(fixed.model)

Linear mixed-effects model fit by maximum likelihood
  Data: DF 
       AIC      BIC    logLik
  4316.113 4342.405 -2153.056

Random effects:
 Formula: ~1 | partno
        (Intercept) Residual
StdDev:   0.7572297 1.016058

Fixed effects:  stress_total ~ 1 + negative.mean + negative.cnt 
                  Value  Std.Error   DF   t-value p-value
(Intercept)   0.5313282 0.24342615 1306  2.182708  0.0292
negative.mean 0.0556071 0.01257117  111  4.423381  0.0000
negative.cnt  0.0657026 0.00572279 1306 11.480883  0.0000
 Correlation: 
              (Intr) ngtv.m
negative.mean -0.949       
negative.cnt   0.000  0.000

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-4.0022728 -0.3932228 -0.0945899  0.1009836  8.5236045 

Number of Observations: 1420
Number of Groups: 113

Example 2: Random Slope Model

random.slope.model <- lme(stress_total ~ 1 + negative.mean + negative.cnt,
                   data = DF, 
                   method = "ML", 
                   na.action = "na.omit",
                   random = list(~realtime|partno))

summary(random.slope.model)

Linear mixed-effects model fit by maximum likelihood
  Data: DF 
       AIC      BIC    logLik
  4174.756 4211.565 -2080.378

Random effects:
 Formula: ~realtime | partno
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 1.3277461 (Intr)
realtime    0.1075518 -0.893
Residual    0.9187873       

Fixed effects:  stress_total ~ 1 + negative.mean + negative.cnt 
                  Value  Std.Error   DF   t-value p-value
(Intercept)   0.6321300 0.21288561 1306  2.969341   3e-03
negative.mean 0.0377227 0.01102278  111  3.422248   9e-04
negative.cnt  0.0603304 0.00550876 1306 10.951731   0e+00
 Correlation: 
              (Intr) ngtv.m
negative.mean -0.949       
negative.cnt   0.028 -0.016

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.02732595 -0.38036935 -0.05720288  0.12664612  8.92243129 

Number of Observations: 1420
Number of Groups: 113

Example 2: Model Comparison

compare_performance(intercept.model, random.intercept.model,
                    fixed.model, random.slope.model)

# Comparison of Model Performance Indices

Name                   | Model |  AIC (weights) |  BIC (weights) |  RMSE | Sigma | AICc (weights) | R2 (cond.) | R2 (marg.) |   ICC |        R2
-----------------------------------------------------------------------------------------------------------------------------------------------
intercept.model        |   gls | 4876.1 (<.001) | 4886.6 (<.001) | 1.345 | 1.345 |        (>.999) |            |            |       | 3.331e-15
random.intercept.model |   lme | 4456.0 (<.001) | 4471.8 (<.001) | 1.029 | 1.066 | 4456.0 (>.999) |      0.374 |      0.000 | 0.374 |          
fixed.model            |   lme | 4316.1 (<.001) | 4342.4 (<.001) | 0.981 | 1.016 | 4316.2 (>.999) |      0.432 |      0.116 | 0.357 |          
random.slope.model     |   lme | 4174.8 (>.999) | 4211.6 (>.999) | 0.863 | 0.919 | 4174.8 (>.999) |      0.692 |      0.049 | 0.676 |

AIC(intercept.model)

[1] 4876.129

AIC(random.intercept.model) # lower

[1] 4456.026

AIC(fixed.model) # lower

[1] 4316.113

AIC(random.slope.model) # lower

[1] 4174.756

Example 2: Model Assumption Checks

check_model(random.slope.model)

check_outliers(random.slope.model)

OK: No outliers detected.
- Based on the following method and threshold:  ().
- For variable: (Whole model)

Example 2: Effect Size

r.squaredGLMM(random.slope.model)

           R2m       R2c
[1,] 0.0751046 0.5254546

Example 2: Examine Results

model_parameters(random.slope.model, summary = TRUE)

# Fixed Effects

Parameter     | Coefficient |       SE |       95% CI |     t |   df |      p
-----------------------------------------------------------------------------
(Intercept)   |        0.63 |     0.21 | [0.21, 1.05] |  2.97 | 1306 | 0.003 
negative mean |        0.04 |     0.01 | [0.02, 0.06] |  3.42 |  111 | < .001
negative cnt  |        0.06 | 5.51e-03 | [0.05, 0.07] | 10.95 | 1306 | < .001

# Random Effects

Parameter                        | Coefficient
----------------------------------------------
SD (Intercept: partno)           |        1.33
SD (realtime: partno)            |        0.11
Cor (Intercept~realtime: partno) |       -0.89
SD (Residual)                    |        0.92

Model: stress_total ~ 1 + negative.mean + negative.cnt (1420 Observations)
Residual standard deviation: 0.919 (df = 1306)
Conditional R2: 0.692; Marginal R2: 0.049

# plot(model_parameters(fixed.model, summary = TRUE))

report(random.slope.model)

We fitted a linear mixed model (estimated using ML and nlminb optimizer) to
predict stress_total with negative.mean and negative.cnt (formula: stress_total
~ 1 + negative.mean + negative.cnt). The model included realtime as random
effects (formula: list(~realtime | partno)). The model's total explanatory
power is substantial (conditional R2 = 0.69) and the part related to the fixed
effects alone (marginal R2) is of 0.05. The model's intercept, corresponding to
negative.mean = 0 and negative.cnt = 0, is at 0.63 (95% CI [0.21, 1.05],
t(1306) = 2.97, p = 0.003). Within this model:

  - The effect of negative mean is statistically significant and positive (beta =
0.04, 95% CI [0.02, 0.06], t(111) = 3.42, p < .001; Std. beta = 0.17, 95% CI
[0.07, 0.27])
  - The effect of negative cnt is statistically significant and positive (beta =
0.06, 95% CI [0.05, 0.07], t(1306) = 10.95, p < .001; Std. beta = 0.21, 95% CI
[0.17, 0.25])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

Other Resources

Free book on MLM: https://www.learn-mlms.com/
Tutorial on simulating nested data: https://debruine.github.io/faux/articles/sim_mixed.html#adding-random-effects
Gelman paper: http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf

Summary

Multilevel models are flexible tools for structured/hierarchical data
Regressions within regressions in understanding the equations/calculations
R makes these easy to run, easystats is still awesome
Thank you for your time on the weekend!