**Sean Trott**

*November 18, 2019*

This tutorial is intended as an accompaniment to my 11/18/2019 workshop, “Mixed effects models in R”. The goals of the lecture-portion of that workshop (and thus this tutorial) are to address the following questions:

- Why build mixed effects models?

- How can you decide what to model as a
**fixed**vs. a**random**effect? And more broadly, which random effects do you need to account for?

- What should your data look like? How do you actually run a model in
`lme4`

?

- How do you interpret and report the results?

The examples given will likely focus on *linear* models in particular (where the critical contrast is between OLS regression and linear mixed models), but where possible I will describe extensions to other models^{1}.

In writing this tutorial, I drew heavily from a number of existing tutorials and resources. I strongly endorse all of them:

- Bates et al (2014): Fitting linear mixed-effects models using lme4

- Winter (2013): Linear models and linear mixed effects models in R with linguistic applications

- Barr et al (2013): Random effects structure for confirmatory hypothesis testing: Keep it maximal
- Bates et al (2015): Parsimonious mixed models

- Piccinini: linear mixed effects models in R

- Clark: shrinkage in mixed effect models

Also, this website has beautiful visualizations illustrating the concept of nested structure in datasets.

First, logistics. We’ll be working with two main libraries: `library(tidyverse)`

and `library(lme4)`

. If you don’t have them installed already, use `install.packages("tidyverse")`

and `install.packages("lme4")`

to install them.

The point of building statistical models is to quantify relationships between two or more variables. For example, how **correlated** are `A`

and `B`

?

This quantification is done in the service of some theoretical goal, e.g., to adjudicate between competing hypotheses. Thus, researchers often want to know some second-order information about this relationship, such as how **unexpected** it is. In the traditional approach of null hypothesis significance testing, this amounts to asking about the probability of your data under the null hypothesis, e.g., `p(d|h)`

: a *p-value*^{2}.

Another popular approach is to use some kind of **goodness of fit** measure, i.e., how successfully does your model fit the data? Using a model selection procedure, various models can be fit to the data and selected on the basis of their explanatory power (using some heuristic to enforce model parsimony). This is especially useful when examining complex, multivariate relationships because it allows researchers to ask whether their variable of interest explains independent sources of variance from other variables, i.e., how much they improve a model without that variable. The goal of a model selection procedure is to settle on the simplest model that best explains your data^{3}.

All statistical methods have a set of assumptions baked into them.

One assumption shared by many statistical methods is that each data point is **independent**, i.e., not dependent on or related to any other observation. For example, if you flip a coin, the probability of getting heads should be independent of other flips; in our theoretical model of coin-flipping, we assume that the generative process underlying a given coin flip is independent of the process underlying other coin flips. Similarly, a procedure like OLS regression assumes that each data point was generated by an independent causal process, and therefore, that your data points are all independent.

This is a useful assumption, and in many cases, is perfectly acceptable.

When is independence fair to assume? Broadly speaking: your data points are independent if you’ve ruled out the possibility of some nested causal process generating multiple observations in your dataset. This causal process could be a *person* (e.g., the same participant contributing multiple data points), an *item* (e.g., the same item being answered by different participants, where the item itself isn’t your variable of interest), or in certain contexts, even an *institution* (e.g., multiple participants coming from the same school).

**Example 1**: You want to know whether a new pharmaceutical drug increases self-reported happiness. You randomly assign people to take the drug vs. some placebo, and 1 month later, each participant answers a single question (e.g., “How happy are you (1-7)?”). You then use a t-test or OLS regression to compare the distribution of people assigned to the treatment group to the distribution of people assigned to the placebo group. Here, each participant contributed exactly 1 data point, so it’s pretty reasonable to assume each data point is independent.

**Example 2**: You want to know whether people respond differently to one kind of metaphor than another. You assign 200 subjects to 1 of two conditions (`A`

vs. `B`

). Each subject is exposed to exactly 1 of 2 possible items (`A`

vs. `B`

). In other words, it’s a between-subjects design with only one item. Again, it seems like these are independent observations; the only similarities across observations are exactly the ones you’re interested in (i.e., experimental condition). Thus, you can run a t-test or OLS regression to ask whether responses are different in the `A`

vs. `B`

condition.

Examples 1-2 seem to meet the independence assumption^{4}.

When this assumption is met, you can analyze your data with something like OLS regression; an example using simulated data is shown below. (Note that for the remainder of the tutorial, analyses will be discussed from the perspective of regression; this is a useful resource explaining how many common statistical tests actually reduce to linear regression.)

Let’s simulate some data. Just to make it obvious, we’ll start out by simulating two normal distributions with different means and identical standard deviations. As depicted below, these groups will have some overlap but still look pretty different.

```
### Set random seed
set.seed(42)
### Illustration of two different distributions, A vs. B
A = rnorm(100, mean = 725, sd = 15)
B = rnorm(100, mean = 705, sd = 15)
df = data.frame(A = A,
B = B) %>%
gather() %>%
mutate(group = key)
df %>%
ggplot(aes(x = value,
fill = group)) +
geom_density(alpha = .6) +
labs(title = "Group A vs. Group B") +
theme_minimal()
```