lme4.
The independence assumption; why it matters; examples of non-independence.
Independence means that each observation in your dataset is generated by a separate, unrelated causal process.
Standard statistical methods (t-tests, OLS regression, ANOVA) assume independence.
When this assumption is violated:
The problem: Treating non-independent observations as independent makes you overconfident in your results.
You test whether a new drug increases happiness:
✓ Independence is reasonable here
Each person contributes exactly one data point.
You test whether people respond differently to metaphor type A vs. B:
✓ Independence is reasonable here
No repeated measures, no item effects to worry about.
Each observation is independent—no nesting structure.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 723.6329 1.507002 480.180393 1.472707e-305
groupB -19.4609 2.131223 -9.131332 7.976194e-17
Interpretation: - Intercept (groupA) = 723.63 (mean of group A) - groupB coefficient = -19.46 (difference from A) - Group B is about 20 units lower than Group A
Non-independence occurs when different observations are systematically related, i.e., they were produced by the same generative process.
Common sources of non-independence in behavioral research:
The sleepstudy dataset tracks reaction times in 18 subjects over subsequent days of sleep deprivation:
Reaction Days Subject
1 249.5600 0 308
2 258.7047 1 308
3 250.8006 2 308
💭 Check-in
Based on the structure of this study, what might be a source of non-independence?
Let’s pretend we don’t know about the nested structure:
💭 Check-in
What’s not pictured here?
Subjects differ in:
Some subjects are consistently faster; others are consistently slower.
Mixed effects models let us account for both types of variance.
Fixed vs. random effects; random intercepts; random slopes.
Mixed effects models combine:
Goal: Get better estimates of fixed effects by accounting for nested structure.
Determining which factors to include as fixed vs. random effects is not always straightforward. Here’s a rough guide, but we’ll discuss it again later too:
How do you decide?
Standard regression:
\[Y = \beta_0 + \beta_1 X + \epsilon\]
Mixed effects model with random intercepts:
\[Y = (\beta_0 + u_0) + \beta_1 X + \epsilon\]
Where:
Random intercepts allow each group (e.g., subject) to have its own baseline:
Same slope, different intercepts.
Basic model with random intercepts:
Breaking down the syntax:
Reaction ~ Days: Fixed effect of Days on Reaction(1 | Subject): Random intercept for each Subject
1 = intercept| = “grouped by”Subject = grouping variableAfter fitting the model, you can extract the random effects:
(Intercept)
308 40.783710
309 -77.849554
310 -63.108567
330 4.406442
These are deviations from the overall intercept (fixed effect).
Each subject’s fitted intercept = fixed intercept + random deviation:
(Intercept)
251.4051
[1] 40.78371
(Intercept)
292.1888
Random slopes allow the effect of X to vary by group:
Model with random intercepts AND slopes:
Estimate Std. Error t value
(Intercept) 251.40510 6.824597 36.838090
Days 10.46729 1.545790 6.771481
Breaking down the syntax:
Reaction ~ Days: Fixed effect of Days(1 + Days | Subject): Random effects for Subject
1 = random interceptDays = random slope for Days| = grouped bySubject = grouping variableAfter fitting the model, you can extract the random effects:
(Intercept) Days
308 2.258551 9.198976
309 -40.398738 -8.619681
310 -38.960409 -5.448856
330 23.690620 -4.814350
Now, we have deviations from the intercept and from the Days slope.
Each subject’s fitted slope = fixed slope + random deviation:
Days
10.46729
[1] 9.198976
Days
19.66626
Subject 308’s RT increases by ~19.67 ms per day (vs. population average of ~10.47 ms).
Without random slopes: - Assumes the effect is the same for everyone - Underestimates uncertainty in the fixed effect - Can lead to false positives
With random slopes: - Acknowledges that effects vary across individuals - Provides more conservative (realistic) estimates - Better generalization to new subjects
Rule of thumb: If a variable varies within your grouping variable, include it as a random slope.
You can include multiple sources of random effects:
Common in psycholinguistics: - Random effects for subjects (people vary) - Random effects for items (stimuli vary) - Called “crossed random effects”
Mixed effects models help when:
Key components: - Fixed effects: Your hypotheses - Random intercepts: Group-level baseline differences
- Random slopes: Group-level differences in effects
lme4Building and evaluating models; model comparisons; best practices and common issues.
lme4 packageMost common R package for mixed models:
Main functions: - lmer(): Linear mixed effects models - glmer(): Generalized linear mixed models (logistic, Poisson, etc.)
Let’s fit a model with random intercepts only:
Note: REML = FALSE uses maximum likelihood estimation, which we need for model comparison.
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
Data: sleepstudy
AIC BIC logLik -2*log(L) df.resid
1802.1 1814.9 -897.0 1794.1 176
Scaled residuals:
Min 1Q Median 3Q Max
-3.2347 -0.5544 0.0155 0.5257 4.2648
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1296.9 36.01
Residual 954.5 30.90
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.4051 9.5062 26.45
Days 10.4673 0.8017 13.06
Correlation of Fixed Effects:
(Intr)
Days -0.380
Now let’s add random slopes for the Days effect:
This allows both the intercept AND the slope to vary by subject.
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ Days + (1 + Days | Subject)
Data: sleepstudy
AIC BIC logLik -2*log(L) df.resid
1763.9 1783.1 -876.0 1751.9 174
Scaled residuals:
Min 1Q Median 3Q Max
-3.9416 -0.4656 0.0289 0.4636 5.1793
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 565.48 23.780
Days 32.68 5.717 0.08
Residual 654.95 25.592
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.632 37.907
Days 10.467 1.502 6.968
Correlation of Fixed Effects:
(Intr)
Days -0.138
Question: Does adding variable \(X\) improve the model over a model without \(X\)?
Approach: Likelihood ratio test
Data: sleepstudy
Models:
model_reduced: Reaction ~ (1 + Days | Subject)
model_full: Reaction ~ Days + (1 + Days | Subject)
npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
model_reduced 5 1785.5 1801.4 -887.74 1775.5
model_full 6 1763.9 1783.1 -875.97 1751.9 23.537 1 1.226e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:: Adding Days significantly improves fit
Fixed effects (population-level):
(Intercept) Days
251.40510 10.46729
Random effects (subject-specific deviations):
(Intercept) Days
308 2.815789 9.075507
309 -40.047855 -8.644152
310 -38.432497 -5.513471
330 22.831765 -4.658665
y; random slopes model correlated variance in y ~ x.CSS 211 | UC San Diego