A Review and Discussion of Residuals for Mixed Models

.title[
# A Review and Discussion of Residuals for Mixed Models
]
.author[
### Katherine Goode
]
.institute[
### Iowa State University, Department of Statistics
]
.date[
### NCCC-170 Meeting - June 20, 2019
]

---

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: #EB811B;
}

.tiny { font-size: 40% }
.small { font-size: 75% }

</style>

# Goals for this Talk

1. Motivate my interest in this topic

2. Provide an overview of residuals available for mixed models

3. Describe residual computations in R and SAS

4. Discussion questions

---

# Motivation

---

# Consulting

### Experiences

- Work with graduate students from many departments at Iowa State
- Frequently help with mixed models
- Clients often forget to check residual plots

### Produce Drop Study Example (based on a consulting project)

- 10 reps
- drop one head of lettuce and one apple per rep
- bacteria measured at 3 times points on each produce item (30 sec, 10 min, 1 hr)

---

### Model for Drop Study

<br>

<br>

```r
# Fits the model in R
drop_model <- lme4::lmer(bacteria ~ rep + produce*time + (1|produce:rep), 
                         data = drop_study)
```

---

# ggResidpanel .tiny[developed with Katie Rey]

```r
# Loads the package
library(ggResidpanel)

# Creates the default of panel of residual plots
resid_panel(drop_model)
```

---

```r
# Refits the model with a log transformation of the reponse
drop_model_log <- lme4::lmer(log(bacteria) ~ rep + produce*time + (1|produce:rep), 
                         data = drop_study)

# Creates a panel to compare the models
resid_compare(models = list(drop_model, drop_model_log), 
              plots = c("resid", "qq"))
```

---

# My Questions

- What residual types are available for mixed models?

- How are the residuals types computed?

- Which types are available in R and SAS?

- How to determine which residual type to use in practice?

---

# Residual Types

---

# Linear Mixed Model Review

Model formula:

`$$\textbf{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{\gamma}+\boldsymbol{\epsilon}$$`

<br>

Assumptions:

`$$\begin{bmatrix} \boldsymbol{\gamma} \\ \textbf{e} \end{bmatrix}
  \sim N
    \begin{pmatrix}
      \begin{bmatrix} \textbf{0} \\ 
      \textbf{0} 
      \end{bmatrix},
      \begin{bmatrix} 
      \textbf{G} & \textbf{0} \\ 
      \textbf{0} & \textbf{R} 
      \end{bmatrix}
    \end{pmatrix}$$`
    
<br>

Marginal and Conditional Distributions:

`$$\textbf{Y} \sim N(\textbf{X}\boldsymbol{\beta}, \ \textbf{ZGZ}'+\textbf{R})$$`
`$$\textbf{Y}|\boldsymbol{\gamma} \sim N(\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{\gamma}, \ \textbf{R})$$`

---

# Raw Residuals

The raw residuals are computed as the observed response values minus the predicted response values. The marginal version does not account for the random effects while the conditional version does.

- **marginal** raw residuals
  `$$r^m_i = Y_i-\textbf{x}'_i\widehat{\boldsymbol{\beta}}$$`
  
- **conditional** raw residuals
  `$$r^c_i = Y_i-\textbf{x}'_i\boldsymbol{\widehat{\beta}}-\textbf{z}'_i\widehat{\boldsymbol{\gamma}}$$`

### Issues

- residuals may not be from populations with equal variances
- conditional residuals tend to be correlated
- conditional residuals are not well suited for detecting outliers

---

# Pearson Residuals

The Pearson residuals are computed as the raw residuals divided by the square root of the estimated variance of the response values.

- **marginal** Pearson residuals
  `$$r^{m,Pearson}_{i} = \frac{r^m_i}{\sqrt{\widehat{Var}[Y_i]}}$$`
  
- **conditional** Pearson residuals
  `$$r^{c,Pearson}_{i} = \frac{r^c_i}{\sqrt{\widehat{Var}[Y_i|\boldsymbol{\gamma}]}}$$`

### Issues

- still may not be from populations with equal variances
- only appropriate if the variability of `$\hat{\boldsymbol{\beta}}$` can be ignored

---

# Standardized Residuals

The standardized residuals are computed as the raw residuals divided by the square root of the variance of the raw residuals.

- **marginal** studentized residuals
  `$$r_i^{m,stand}=\frac{r_i^m}{\sqrt{Var[r_i^m]}}$$`
  
- **conditional** studentized residuals
  `$$r_i^{c,stand}=\frac{r_i^c}{\sqrt{Var[r_i^c]}}$$`
  
### Reasons for Standardization

- to deal with residuals being from populations with different variances
- "to normalize the residuals into a signal-to-noise ratio" (Stroup, SAS for Mixed Models)

---

# Studentized Residuals

The studentized residuals are computed as the standardized residuals with the variances of the raw residuals replaced by the estimated variances.

- **marginal** studentized residuals
  `$$r_i^{m,stud}=\frac{r_i^m}{\sqrt{\widehat{Var}[r_i^m]}}$$`
  
- **conditional** studentized residuals
  `$$r_i^{c,stud}=\frac{r_i^c}{\sqrt{\widehat{Var}[r_i^c]}}$$`

### Reasons for Studentization

- same as with standardization

---

# Other Types

Other types exist:

- externally studentized
- error recovery residuals

---

# Residuals in R and SAS

---

# SAS

### [proc mixed residuals](https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mixed_sect027.htm)

- offer the key six types discussed so far and "scaled" residuals

<br>

---

# R

### lme4

- offers conditional "pearson" and "response" (raw)
- default is "response"
- see details from the `residuals.merMod` function below

<br>

---

### `lme4` Pearson Residuals vs `ggResidpanel` Pearson Residuals

```r
plot(drop_model_log,
     form = resid(., type = "pearson") ~ 
       fitted(.),
     abline = 0)
```

![](slides_files/figure-html/unnamed-chunk-9-1.png)
]

```r
resid_panel(drop_model_log,
            plot = "resid", 
            type = "pearson",
            title.text.size = 24, axis.text.size = 16)
```

![](slides_files/figure-html/unnamed-chunk-10-1.png)
]

---

### `lme4` Scaled Pearson Residuals vs `ggResidpanel` Pearson Residuals

```r
plot(drop_model_log,
     form = resid(., type = "pearson", 
                  scaled = TRUE) ~ 
       fitted(.), abline = 0)
```

![](slides_files/figure-html/unnamed-chunk-11-1.png)
]

```r
resid_panel(drop_model_log,
            plot = "resid", 
            type = "pearson", 
            title.text.size = 24, axis.text.size = 16)
```

![](slides_files/figure-html/unnamed-chunk-12-1.png)
]

---

### `lme4` Pearson Residuals vs `ggResidpanel` Response Residuals

```r
plot(drop_model_log,
     form = resid(., type = "pearson") ~ 
       fitted(.),
     abline = 0)
```

![](slides_files/figure-html/unnamed-chunk-13-1.png)
]

```r
resid_panel(drop_model_log,
            plot = "resid", 
            type = "response",
            title.text.size = 24, axis.text.size = 16)
```

![](slides_files/figure-html/unnamed-chunk-14-1.png)
]

---

# Discussion

---

# Marginal versus Conditional Residuals?

- Residual plots will look different when conditioning on random effects
- Should you always look at both? 
- Is just looking at conditional fine? If so, why?

.pull-left[
<img src="slides_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
]

.pull-right[
<img src="slides_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
]

---

# Residuals in Practice?

- Which residual types do you use in practice?
    - raw
    - Pearson
    - studentized
    - other
- What motivates you to use this type of residual? 
- What information do you want to gain from using a specific residual type?