ggResidpanel provides a way to easily create and view diagnostic plots from models in R using ggplot2 graphics. The goal in creating the package was to allow a model to be passed to a function that returns a panel of diagnostic plots that can be viewed simultaneously. The panel allows the user to scan plots of interest to check for violations of model assumptions or lack of fit. The idea to portray the plots in a grid was motivated by the residual panel plots provided in SAS procedures. In addition to being able to view plots in a panel, ggResidpanel allows for the creation of panels with interactive plots and the ability to view plots from multiple models in the same panel. These operations can be obtained by applying one of the four functions listed below to a model.
resid_panel
: Creates a panel of diagnostic plots of the residuals from a modelresid_interact
: Creates an interactive panel of diagnostic plots of the residuals from a modelresid_xpanel
: Creates a panel of diagnostic plots of the predictor variablesresid_compare
: Creates a panel of diagnostic plots from multiple modelsAs of now, ggResidpanel allows these functions to work with models of type “lm”, “glm”, “lme”, “lmer”, “glmer”, and “lmerTest”. An additional function is included in the package that can be used with any model type and produces similar output as resid_panel
.
resid_auxpanel
: Creates a panel of diagnostic plots for model types not included in the packageAll functions in the package include the ability to select which plots to include in the panel, ways to adjust plot characteristics, and options to change the figure format. Each function has a section in this vignette with details on how to use the function and examples.
The package can be installed from CRAN, or the development version of the package can be installed from GitHub (if desired). The code below shows how to accomplish both of these tasks.
# Installs ggResidpanel from CRAN
install.packages("ggResidpanel")
# Installs the development version of ggResidpanel from GitHub
devtools::install_github("goodekat/ggResidpanel")
To use the package in R, load the library into your R session with the following code.
# Loads the library
library(ggResidpanel)
The functions in this vignette will be demonstrated by using the trees
data included in base R. The dataset contains information on the volume, girth, and height of 31 black cherry trees. The first six rows of the data are shown below.
# Loads the dplyr library and displays the first six rows of the dataset
head(trees)
A linear model is fit below to determine if there is a linear relationship between the volume of the tree and its height and girth. This model will be used for examples throughout this vignette.
# Fits a linear model with a response variable of volume and predictor
# variables of height and girth
tree_model <- lm(Volume ~ Height + Girth, data = trees)
resid_panel
The function resid_panel
is applied to a model and returns a panel of diagnostic plots. It currently accepts the following models.
lm
from base Rglm
from base Rlmer
or glmer
functions from either the lme4 package or fit with the lmerTest package loadedlme
function from the nlme
packageThe first argument in resid_panel
is the model
option. The most basic use of resid_panel
is to only include the model in the function. The code below shows the figure that is created if the tree_model
is input into resid_panel
with no other options specified. This produces a panel with the four plots of a residual plot, a normal quantile plot, an index plot, and a histogram of the residuals.
# Creates the default panel of plots based on the tree_model
resid_panel(tree_model)
The plots
option in resid_panel
allows the user to select the designated plots to include in the panel. There are three ways a user can do this.
Explanations and examples for each of these options are included in the next three sections.
An individual plot can be created by including the option of plots = "name of plot"
in the resid_panel
function. The name of the plot must be in quotations. There are currently nine plots included in the package with resid_panel
. Their names in the package are as follows.
All plots are available to be used with “lm” and “glm” models, but cookd
, lev
, and ls
are not available to be used with “lmer”, “glmer”, “lmerTest”, and “lme” models. The details and examples of each plot are included below.
boxplot
: Boxplot of the ResidualsThe option of plots = "boxplot"
creates a boxplot of the residuals.
This can be used to visualize the distribution of the residuals from the model. It may help to identify outliers or determine if the distribution of the residuals is skewed.
# Creates a boxplot of the residuals
resid_panel(tree_model, plots = "boxplot")
cookd
: Cook’s Distance PlotThe option of plots = "cookd"
creates a plot of the Cook’s distance values versus the observation numbers. It is only available for “lm” and “glm” models. The blue dashed horizontal line is placed at 4/n where n is the number of observations used to fit the model (Rawlings, Pantula, and Dickey 1998).
This plot can be used to check for points with high leverage. Points above the dashed blue line are considered to be high leverage points, and points that have Cook’s D values that are much larger than the rest are of particular interest.
# Creates a Cook's D plot
resid_panel(tree_model, plots = "cookd")
hist
: Histogram of the ResidualsThe option of plots = "hist"
creates a histogram of the residuals. The blue line is a normal density curve with a mean of zero and a standard deviation equal to the standard deviation of the residuals.
resid_panel
includes a bins
option to specify the number of bins in the histogram. By default, bins = 30
is based on the default for the number of bins in the ggplot2 geom_histogram
function.
This is another plot that can be used to visualize the distribution of the residuals. In particular, the normal density curve allows for the comparison of the residuals to a normal distribution.
# Creates a histogram of the residuals
resid_panel(tree_model, plots = "hist")
# Creates a histogram with 20 bins
resid_panel(tree_model, plots = "hist", bins = 20)
index
: Index Plot of the ResidualsThe option of plots = "index"
creates a plot of the residuals versus the observation numbers. A solid blue horizontal line through 0 is included for reference.
resid_panel
includes a smoother
indicator option. If set to TRUE
, a loess smoother will included on the index plot as a red solid line. If set to FALSE
, it will not be included. By default, smoother = FALSE
. (This option also affects the lev
, ls
, and resid
plots.)
This plot can be used to look for patterns in the residuals in regards to the order of the data used to fit the model. Often the data are ordered in a meaningful way such as by time of observation. This plot can help to check if there is any relationship between the residuals and the order of the data. If a trend is found in this plot, it may suggest that a variable has been excluded from the model that would help to explain the variation in the response variable.
# Creates an index plot of the residuals
resid_panel(tree_model, plots = "index")
# Creates an index plot with a smoother added
resid_panel(tree_model, plots = "index", smoother = TRUE)
lev
: Residual-Leverage PlotThe option of plots = "lev"
creates a plot of the standardized residuals versus the leverage values. This plot is only available for “lm” and “glm” models. A horizontal line through 0 and a vertical line through 0 are included as black dashed lines to mimic the residual-leverage plot created by the plot.lm
function from base R. The red dashed lines are Cook’s distance contour lines for Cook’s D values of 0.5 and 1. These values were chosen based on the default options used in plot.lm
.
The smoother
option in resid_panel
also affects the location-scale plot. If set to TRUE
, a loess smoother will be included on the residual-leverage plot as a red solid line. If set to FALSE
, it will not be included. By default, smoother = FALSE
. (This option also affects the index
, ls
, and resid
plots.)
The Cook’s D contour lines are computed using the fact that Cook’s distance can be written as a function of the leverage and the standardized residual. For observation \(i\), let \(D_i\) represent the Cook’s distance, \(r_i\) represent the standardized residual, and \(h_i\) represent the leverage value. Finally, let \(p\) be the rank of the model. Cook’s distance can be computed as \[D_i = \frac{r^2_i}{p}\left(\frac{h_i}{1-h_i}\right).\] (Seber and Lee 2003). Thus, given a specified value of Cook’s D, a leverage value (\(h_i\)), and the rank of the model (\(p\)), it is possible to solve for the value of the standardized residual (\(r_i\)). The value of \(D_i=1\) is used since a data point with a value of Cook’s D larger than 1 is often considered to be a point with high leverage (Chatterjee and Hadi 2012).
This plot can be used to look for trends in the residuals based on the leverage values and to identify points with high leverage. Points that fall outside of the Cook’s D contour lines may be of interest. Points that fall outside of either contour line with Cook’s D set to 1 are considered to be high leverage points. As seen in the plot below, not all contour lines may appear when the plot is created if they fall far outside of the range of the observed leverage values.
# Creates a residual-leverage plot
resid_panel(tree_model, plots = "lev")
# Creates a residual-leverage plot with a smoother added
resid_panel(tree_model, plots = "lev", smoother = TRUE)
ls
: Location-Scale PlotThe option of plots = "ls"
creates a location-scale plot of the residuals. This plot is only available for “lm” and “glm” models. It plots the square root of the absolute value of the standardized residuals on the y-axis and the predicted values on the x-axis. The predicted values are plotted on the original scale for “glm” and “glmer” models.
The smoother
option in resid_panel
affects the appearance pf the location-scale plot. If set to TRUE
, a loess smoother will be included on the location-scale plot as a red solid line. If set to FALSE
, it will not be included. By default, smoother = FALSE
. (This option also affects the index
, lev
, and resid
plots.)
The location-scale plot can be used to check for patterns in the residuals in relationship to the predicted values. For example, homogeneity of the residuals can be diagnosed by determining whether the residuals show equal spread along the range of the predicted values. In the ideal situation, the loess curve would be a straight line with points evenly dispersed around it for the whole range of the predicted values.
# Creates a location-scale plot of the residuals
resid_panel(tree_model, plots = "ls")
# Creates a location-scale plot of the residuals with a smoother added
resid_panel(tree_model, plots = "ls", smoother = TRUE)
qq
: Normal Quantile PlotThe option of plots = "qq"
creates a normal quantile plot of residuals using the R package qqplotr. The sample quantiles are plotted on the y-axis, and the theoretical normal quantiles are plotted on the x-axis. See the qqplotr documentation for details on the computation of the sample and theoretical quantiles. The blue line is a 1-1 line shown for reference.
resid_panel
includes two options to adjust the normal quantile plot based on the qqplotr package. The option of qqline
indicates whether to include the 1-1 line on the qq-plot. By default, qqline = TRUE
. The option of qqbands
indicates whether to include 95% confidence bands on the qq-plot. By default, qqbands = FALSE
.
This plot is meant to be used to check if the residuals approximately follow a normal distribution. If the points follow the 1-1 line, it suggests that the residuals are approximately normally distributed.
# Creates a normal quantile plot of the residuals
resid_panel(tree_model, plots = "qq")
# Creates a qq-plot with the 1-1 line removed
resid_panel(tree_model, plots = "qq", qqline = FALSE)
# Creates a qq-plot with the confidence bands added
resid_panel(tree_model, plots = "qq", qqbands = TRUE)
resid
: Residual PlotThe option of plots = "resid"
creates a plot of the residuals versus the predicted values. The predicted values are plotted on the original scale for “glm” and “glmer” models. A solid blue horizontal line through 0 is included for reference.
The smoother
option in the resid_panel
function can add a smoother to this plot. If set to TRUE
, a loess smoother will be included on the residual plot as a red solid line. If set to FALSE
, it will not be included. By default, smoother = FALSE
. (This option also affects the index
, lev
, and ls
plots.)
The residual plot can be used to check for homogeneity (constant variance) of the residuals and possible violations of the linearity assumption. The assumption of homogeneity is met if the residuals have approximately the same variance throughout the entire range of the predicted values. The assumption of linearity may be violated if there is some trend in the residuals so that the residuals are not evenly distributed below and above 0 throughout the range of the predicted values.
# Creates a residual plot
resid_panel(tree_model, plots = "resid")
# Creates a plot with the smoother added
resid_panel(tree_model, plots = "resid", smoother = TRUE)
yvp
: Response vs. Predicted PlotThe option of plots = "yvp"
creates a plot of the observed response variable values versus the predicted values from the model. Both response variable and predicted values are plotted on the original scale for “glm” and “glmer” models. The blue solid line is a 1-1 line, which is included for reference.
This plot provides a visualization to assess how similar the model predictions of the response variable are in relationship to the observed response variable values. The model is producing predictions similar to the observed values if the points follow the 1-1 line.
# Creates a plot of observed values vs. fitted values
resid_panel(tree_model, plots = "yvp")
The plots
option in resid_panel
also allows the user to specify a vector of plots to include in a panel. To do this, the user can include an option of the form of plots = c("name of plot 1", "name of plot 2", ...))
in the function. The names of the plots must be in quotations. Any of the individual plots listed in the previous section can be included in the vector if they are available for the type of model being input to resid_panel
. Also, any number of plots greater than 0 can be included in the vector. Note that if a plot name is included multiple times in the vector, a plot will be created for each instance the name is included. Some examples of user specified panels are included below.
# Creates a panel with a user specified vector of two plot names
resid_panel(tree_model, plots = c("resid", "qq"))
# Creates a panel from a vector of four plot names
resid_panel(tree_model, plots = c("hist", "ls", "cookd", "lev"))
# An example to show what happens if a plot name is included multiple times in the vector
resid_panel(tree_model, plots = c("resid", "resid"))
ggResidpanel includes prespecified panel options for resid_panel
that the authors of the package thought users would find helpful. These can be used by including the option of plots = "specified panel name"
in resid_panel
. The four panel options included in the package are as follows.
The options of all
, default
, and SAS
can be used for all model types, but the option of R
can only be used for models of type “lm” or “glm”. The details of these panels and examples are included below.
all
: Panel of All PlotsThe option of plots = "all"
creates a panel of all plots included in the package that are available for the type of model input into resid_panel
. Note that “cookd”, “ls”, and “lev” are not available for “lmer”, “lmerTest”, “glmer”, and “lme” models.
# Creates a panel of all plots available for an "lm" model
resid_panel(tree_model, plots = "all")
default
: Default PanelUnsurprisingly, the option of plots = "default"
is the default panel displayed when resid_panel
is applied to a model with no plots
option specified. This creates a panel with a residual plot, a normal quantile plot of the residuals, an index plot of the residuals, and a histogram of the residuals.
# Creates the default panel of plots
resid_panel(tree_model)
# Creates the default panel with the option explicitly specified
resid_panel(tree_model, plots = "default")
R
: Base R Inspired Panel of PlotsThe option of plots = "R"
is designed to mimic the diagnostic plots created by applying the base R function plot
to an “lm” or “glm” model. As a result, it can only be used with an “lm” or “glm” model. It creates a panel with a residual plot, a normal quantile plot of the residuals, a location-scale plot, and a residual-leverage plot.
# Creates the R panel of plots with the smoother option set to TRUE
resid_panel(tree_model, plots = "R", smoother = TRUE)
# These are the lm diagnostic plots from base R which the "R" panel is designed to mimic
par(mfrow = c(2,2)) # Creates a 2x2 grid of plots
plot(tree_model) # Creates the base R diagnostic plots
par(mfrow = c(1,1)) # Returns the settings to a 1x1 grid of plots
SAS
: SAS Inspired Panel of PlotsThe option of plots = "SAS"
creates a panel with a residual plot, a normal quantile plot of the residuals, a histogram of the residuals, and a boxplot of the residuals. This was modeled after the residpanel option in proc mixed from SAS.
# Creates the SAS panel of plots
resid_panel(tree_model, plots = "SAS")
The option of type
in resid_panel
allows for the selection of the type of residuals to use in the panel. Several residual types are available to be requested based on the model type that is input into resid_panel
. These are listed below. If a type
is not specified, the default residual type for each model type is used. If a type other than raw residuals are used, the axis label for the axis that the residuals are plotted on will specify the type of residual used. For example, if the raw residuals are used, the axis label will be “Residuals”, but if Pearson residuals are used, the axis label will be “Pearson Residuals”.
type
“lm” model residual options:
pearson
: Pearson residualsresponse
: raw residuals (default for “lm”)standardized
: standardized raw residuals“glm” model residual options:
pearson
: Pearson residualsdeviance
: deviance residuals (default for “glm”)response
: raw residualsstand.deviance
: standardized deviance residualsstand.pearson
: standardized Pearson residuals“lmer”, “lmerTest”, and “lme” model residual options:
pearson
: Pearson residuals (default for “lmer” and “lmerTest”)response
: raw residuals“glmer” model residual options:
pearson
: Pearson residualsdeviance
: deviance residuals (default for “glmer”)response
: raw residualsNote that the plots of ls
and lev
only accept standardized residuals.
# Example requesting standardized residuals with a panel of plots from an lm model
resid_panel(tree_model, type = "standardized")
All of the functions in ggResidpanel have the following options for adjusting the format of the panel.
The details of how to use these options and examples of displaying the changes are shown below.
axis.text.size
and title.text.size
: Plot Text SizeThere are two options that can be used to adjust the text size in the panel. The option of axis.text.size
specifies the size of the text for the axis labels. By default, axis.text.size = 10
. The option of title.text.size
specifies the size of the text for the titles. By default, title.text.size = 12
. Both of these options adjust the text size in all of the plots in the panel.
# Creates the default panel with larger text sizes for both the axis labels and the title
resid_panel(tree_model, axis.text.size = 14, title.text.size = 16)
nrow
: Number of Rows in the PanelThe option of nrow
allows the user to specify the number of rows in the panel. This works for both user and package prespecified panels.
# Creates the default panel of four plots with all of the plots in a row instead of a 2x2 grid
resid_panel(tree_model, nrow = 1)
# Creates a panel with a residual plot and a qq-plot with two rows
resid_panel(tree_model, plots = c("resid", "qq"), nrow = 2)
scale
: Scale of GraphThe option of scale
adjust the size of the graphs in the panel. It takes values in the interval of (0,1]. The user may find this helpful if the some of the text from one graph overlaps another graph or if the graphs appear too close together in a panel.
# Creates a panel of plots with the default scale
resid_panel(tree_model, plots = c("lev", "yvp", "ls", "cookd"))
# Creates the same panel of plots shown above with a scale of 0.9
resid_panel(tree_model, plots = c("lev", "yvp", "ls", "cookd"), scale = 0.9)
theme
: Panel ThemeThe option of theme
adjusts the ggplot2 theme to be used when creating the plots. The current options available in ggResidpanel are “bw”, “classic”, and “grey” (or “gray”). The default is “bw”. The theme is applied to all plots in the panel.
# Creates the default panel with the classic theme
resid_panel(tree_model, theme = "classic")
# Creates the default panel with the grey theme
resid_panel(tree_model, theme = "grey")
title.opt
: Title OptionThe option of title.opt
indicates whether or not to include a title on the plots in the panel. It can be set to TRUE
or FALSE
, and the default is set to TRUE
.
# Creates the default panel with the titles removed
resid_panel(tree_model, title.opt = FALSE)
resid_interact
The function resid_interact
makes use of the plotly package to create interactive panels of diagnostic plots. The interactivity allows users to hover their cursor over plots to easily access information about the points that are included in the dataset. This function was included with the intention of helping users identify outliers and points of interest.
resid_interact
functions in a similar way to resid_panel
. It has (mostly) the same input options, and it includes the same plots and prespecified panels. Similar to resid_panel
, resid_interact
currently requires a model of type “lm”, “glm”, “lmer”, “glmer”, “lmerTest”, or “lme”.
The first argument in resid_interact
is the model
option. The most basic way to use resid_interact
is to input a model into the function as seen below. It creates an interactive version of the default panel from resid_panel
. When the cursor is hovered over a point, a tooltip appears. It contains the \(x\) and \(y\) coordinates of the point. The remaining lines that appear after “Data:” are the observed values for the data point for all of the variables included in the model and the observation number. Different types of plots have slightly varying types of interactivity. All of the functionality is described below in the section on individual plots.
# Creates an interactive version of the default panel from resid_panel
resid_interact(tree_model)
resid_interact
offers the same three ways in a which a user can create a panel of plots:
Additional details on these methods are described in the next three sections.
In the same way an individual plot can be created in resid_panel
, an individual interactive plot can be created in resid_interact
by including the option of plots = "name of plot"
. The name of the plot must be in quotations. The same nine plots included with resid_panel
are included with resid_interact
. Their names in the packages are as follows.
Again, all plots are available to be used with “lm” and “glm” models, but cookd
, lev
, and ls
are not available to be used with “lmer”, “glmer”, “lmerTest”, and “lme” models. Instructional details for creating each of the plots are described below. For information on how the plots are created, return to the section on individual plots with resid_panel
.
boxplot
: Boxplot of the ResidualsThe option plots = "boxplot"
creates an interactive boxplot. If the cursor is hovered over the vertical line created by the whiskers in the plot, a tooltip appears at the location of each of the residual points. (Note that the points are not visible on the plot.) The tooltip lists the value of the residual, and the remaining information lists the variables included in the model with the observed value and the observation number of the point. If the cursor is hovered anywhere else on the boxplot, tooltips will appear showing the location and value of the minimum, maximum, median, and first and third quantiles of the residuals.
# Creates an interactive boxplot of the residuals
resid_interact(tree_model, plots = "boxplot")
cookd
: Cook’s Distance PlotThe option plots = "cookd"
creates an interactive Cook’s distance plot. When the cursor is hovered over a point, a tooltip is displayed. It lists the Cook’s D value for that point as well as the observed data values and observation number. This plot is only available for “lm” and “glm” models.
# Creates an interactive Cook's D plot
resid_interact(tree_model, plots = "cookd")
hist
: Histogram of the ResidualsThe option of plots = "hist"
creates an interactive histogram of the residuals. The cursor can be hovered over the bars in the histogram to display a tooltip that lists the density value at the top of the bar and the number of observations in the bar (“count”). If the cursor is hovered over the bottom of the plot where the density is equal to 0, a tooltip appears that shows the location of the data points and lists the observed variable values and the observation number of the data point.
resid_interact
includes the same bins
option as resid_panel
for the histogram. It allows the user to specify the number of bins to use when creating the histogram. Again, by default, bins = 30
.
# Creates an interactive histogram of the residuals
resid_interact(tree_model, plots = "hist")
# Creates an interactive histogram with 20 bins
resid_interact(tree_model, plots = "hist", bins = 20)
index
: Index Plot of the ResidualsThe option of plots = "index"
creates an interactive index plot of the residuals. The cursor can be hovered over a data point to see a tooltip that contains the observation number of the point, the value of the residual, and the observed variable values.
resid_interact
also includes the smoother
option for the index plot which indicates whether or not to include a loess smoother on the index plot. Set smoother = TRUE
to have the smoother appear. The default is smoother = FALSE
. This option also affects the residual-leverage, location-scale, and residual plots if included in the same panel.
# Creates an interactive index plot of the residuals
resid_interact(tree_model, plots = "index")
# Creates an interactive index plot with a smoother
resid_interact(tree_model, plots = "index", smoother = TRUE)
lev
Residual-Leverage PlotThe option plots = "lev"
creates an interactive residual-leverage plot. The cursor can be hovered over a data point to see a tooltip that contains the leverage, the standardized residual, the observed variable values, and the observation number of the data points. This plot is only available for “lm” and “glm” models.
The smoother
option in resid_interact
is also available for the residual-leverage plot. It indicates whether or not to include a loess smoother on the plot. Set smoother = TRUE
to have the smoother appear. The default is smoother = FALSE
. This option also affects the index, location-scale, and residual plots if included in the same panel.
# Creates an interactive residual-leverage plot
resid_interact(tree_model, plots = "lev")
# Creates an interactive residual-leverage plot with a smoother
resid_interact(tree_model, plots = "lev", smoother = TRUE)
ls
: Location-Scale PlotThe option plots = "ls"
creates an interactive location-scale plot. The cursor can be hovered over a point to see a tooltip that contains the prediction, the square root of the absolute value of the standardized residual, the observed values of the variables in the model, and the observation number of the point. This plot is only available for “lm” and “glm” models.
The smoother
option in resid_interact
is also available for the location-scale plot. It indicates whether or not to include a loess smoother on the plot. Set smoother = TRUE
to have the smoother appear. The default is smoother = FALSE
. This option also affects the index, residual-leverage, and residual plots if included in the same panel.
# Creates an interactive location-scale plot
resid_interact(tree_model, plots = "ls")
# Creates an interactive location-scale plot with a smoother
resid_interact(tree_model, plots = "ls", smoother = TRUE)
qq
: Normal Quantile PlotThe option plots = "qq"
creates an interactive normal quantile plot. When the cursor is hovered over a point, a tooltip appears that displays the theoretical value, the sample quantile, the observed data values, and the observation number of the point.
resid_interact
contains the same option of qqline
that is included with resid_panel
. It indicates whether to include a 1-1 line on the qq-plot. If TRUE
is specified, the line is included. By default, qqline = TRUE
. The option of qqbands
that is included with resid_panel
has not been implemented in plotly, so it is not available as an option with resid_interact
.
# Creates an interactive normal quantile plot
resid_interact(tree_model, plots = "qq")
# Creates a normal quantile plot with the 1-1 line removed
resid_interact(tree_model, plots = "qq", qqline = FALSE)
resid
: Residual PlotThe option of plots = "resid"
creates an interactive residual plot. When the cursor is hovered over a point, a tooltip appears that contains the prediction, the residual, the observed variable values, and the observation number associated with the point.
The smoother
option in resid_interact
is also available for the residual plot. It indicates whether or not to include a loess smoother on the plot. Set smoother = TRUE
to have the smoother appear. The default is smoother = FALSE
. This option also affects the index, residual-leverage, and location-scale plots if included in the same panel.
# Creates an interactive residual plot
resid_interact(tree_model, plots = "resid")
# Creates the residual plot with a smoother
resid_interact(tree_model, plots = "resid", smoother = TRUE)
yvp
: Response vs. Predicted PlotThe option of plots = "yvp"
creates an interactive plot of the response variable versus the predicted value. If the cursor is hovered over a point, a tooltip appears that contains the predicted value, the response variable value, the observed variable values, and the observation number.
# Creates an interactive residual vs. predicted value plot
resid_interact(tree_model, plots = "yvp")
resid_interact
allows for user specified panels. Just as with resid_panel
, this can be done by including a vector of plot names in the form of plots = c("name of plot 1", "name of plot 2", ...))
. Again, any of the individual plots listed in the previous section that are available for the type of model input into the function can be included in the vector. Some examples of user specified panels are included below.
# Creates a panel with a user specified vector of two plot names
resid_interact(tree_model, plots = c("resid", "qq"))
# Creates a panel from a vector of four plot names
resid_interact(tree_model, plots = c("hist", "ls", "cookd", "lev"))
resid_interact
offers the same four prespecified panels as resid_panel
:
These can be used by including the option of plots = "specified panel name"
. As with resid_panel
, the options of all
, default
, and SAS
can be used for all model types, but the option of R
can only be used for models of type “lm” or “glm”. The details of these panels and examples are included below.
all
: Panel of All PlotsThe option of plots = "all"
creates an interactive version of the panel of all plots included in the package that are available for the type of model input into resid_interact
.
# Creates an interactive panel of all plots available for an "lm" model
resid_interact(tree_model, plots = "all")
default
: Default PanelThe option of plots = "default"
creates an interactive version of the default panel of plots. That is, this is the panel that is created if no plots
option is specified in resid_interact
.
# Creates an interactive default panel with the option explicitly specified
resid_interact(tree_model, plots = "default")
R
: Base R Inspired Panel of PlotsThe option of plots = "R"
creates an interactive version of the R panel of plots. This option can only be used with an “lm” or “glm” model.
# Creates an interactive version of the R panel of plots with the smoother option set to TRUE
resid_interact(tree_model, plots = "R", smoother = TRUE)
SAS
: SAS Inspired Panel of PlotsThe option of plots = "SAS"
creates an interactive version of the SAS panel.
# Creates an interactive version of the SAS panel of plots
resid_interact(tree_model, plots = "SAS")
resid_interact
includes the option of type
to specify the type of residual to use in the panel. This option works in the same way as the type
option in resid_panel
. See the section on residual types under resid_panel
for details on how to use this option and the available residual types for each of the model types.
resid_interact
has the same formatting options as all of the functions in ggResidpanel. See the section on formatting options under the resid_panel
documentation for the details on how to use the formatting options, or click on a formatting option below to go directly to the section with the details for that option.
resid_xpanel
When working with linear models, it is often helpful to assess the model by viewing the residuals versus the predictor variables. Additionally, it can be helpful to view the response variable versus the predictor variables to understand the relationships between the variables. The function resid_xpanel
can be applied to a model to create a panel of plots of the residuals or the response variable (as specified by the user) versus the predictor (\(X\)) variables in the model. Interactions between predictor variables are not currently included in the panel. resid_xpanel
currently accepts the model types of “lm”, “glm”, “lmer”, “glmer”, “lmerTest”, and “lme”.
The first argument in resid_xpanel
is the model
option. The most basic use of resid_xpanel
is to input a model into the function as shown below. This creates a panel of scatterplots of the residuals from the model versus the predictor variables in the model. Two predictor variables were included in the tree_model
(height and girth). As a result, the code below produces a panel with two plots.
# Creates a panel of residuals versus the predictor variables in the tree_model
resid_xpanel(tree_model)
For predictor variables that are factors, the levels shown on the x-axis will be in the order that the levels are ordered in the data frame. This can be adjusted by reordering the levels of the factor before the model is fit.
resid_xpanel
includes the option of yvar
. This allows the user to specify whether the residuals or the response variable from the model should be plotted on the y-axes of the plots in the panel. If yvar = "residual"
, then the residuals will be plotted on the y-axes. A solid blue horizontal line will also be included at 0 for reference. If yvar = "response"
, then the response variable will be plotted on the y-axes. By default, yvar
is set to "residual"
.
# Creates an xpanel with the y-axis option of residual explicitly specified
resid_xpanel(tree_model, yvar = "residual")
# Creates an xpanel with the y-axis option set to response
resid_xpanel(tree_model, yvar = "response")
resid_xpanel
includes the option to overlay loess smoothers on all of the plots in the panel. This is done by including the option smoother = TRUE
in the function. By default, smoother = FALSE
.
# Creates an xpanel with residuals on the y-axis and with loess smoothers on all of the plots
resid_xpanel(tree_model, smoother = TRUE)
# Creates an xpanel with the response variable on the y-axis and with loess smoothers on all of the plots
resid_xpanel(tree_model, yvar = "response",smoother = TRUE)
resid_xpanel
includes the option of type
to specify the type of residual to use in the panel. This option works in the same way as the type
option in resid_panel
. See the section on residual types under resid_panel
for details on how to use this option and the available residual types for each of the model types.
resid_xpanel
has the same formatting options as all of the functions in ggResidpanel. See the section on formatting options under the resid_panel
documentation for the details on how to use the formatting options, or click on a formatting option below to go directly to the section with the details for that option.
resid_compare
The function of resid_compare
was created to allow for the comparison of diagnostic plots between models. This function may be particularly helpful when a model assumption is not met and an adjustment is made to the model such as a transformation of the response in order to try to meet the model assumptions. resid_compare
creates a panel with residual diagnostic plots from both models given a list of the models. This allows for side by side comparison of the changes in the diagnostic plots to determine if the adjustments are effective. resid_compare
currently accepts models of type “lm”, “glm”, “lmer”, “glmer”, “lmerTest”, and “lme”.
The first argument in resid_compare
is the models
option, which requires a list of models. The most basic use of resid_compare
is to only input a list of models into the function.The diagnostic plots from the tree_model
suggest possible issues with linearity and homogeneity. The code below fits two new models that attempt to deal with these issues. The three models of tree_model
, tree_model_squared
, and tree_model_log
are input into resid_compare
as a list. This creates a 4x3 panel of plots. The columns represent the models, and the rows are different plot types. Each plot type is created for all three models. By default, the four plots in the default resid_panel
appear. resid_compare
includes options to change the plots and adjust the appearance of the figure.
# Fits a model with the log transformed volume variable as the response
tree_model_log <- lm(log(Volume) ~ Height + Girth, data = trees)
# Fits a model with an added squared term of girth
tree_model_squared <- lm(Volume ~ Height + Girth + I(Girth^2), data = trees)
# Creates a panel with the four default diagnostic plots for all three models
resid_compare(models = list(tree_model, tree_model_log, tree_model_squared))
The plots
option can be used to specify which plots to include in the panel. It uses the same three methods of selecting plots as resid_panel
and resid_interact
, and each of these methods include the same plotting options as the other functions. An overview of the options available are listed below. For more details on how to specify these options, how the plots are made, and additional graphing options associated with the plots, see the plots section under resid_panel
or click on one of the links below to go directly to the documentation.
plots = c("name of plot 1", "name of plot 2", ...))
Here are a few examples specifying different plot options with resid_compare
.
# Creates a panel comparing residual plots between two models with smoothers
resid_compare(models = list(tree_model, tree_model_log),
plots = "resid", smoother = TRUE)
# Creates a panel comparing residual plots, qq-plots, and Cook's D plots between two models with confidence bands added to the qq-plot
resid_compare(models = list(tree_model, tree_model_log),
plots = c("resid", "qq", "cookd"),
qqbands = TRUE)
# Creates a SAS panel of plots for both models using 20 bins to create the histograms
resid_compare(models = list(tree_model, tree_model_log),
plots = "SAS",
bins = 20)
resid_compare
includes the option of type
to specify the type of residual to use in the plot. This option works in the same way as the type
option in resid_panel
. See the section on residual types under resid_panel
for details on how to use this option and the available residual types for each of the types of models.
resid_compare
has the same formatting options as all of the functions in ggResidpanel. See the section on formatting options under the resid_panel
documentation for the details on how to use the formatting options, or click on a formatting option below to go directly to the section with the details for that option.
resid_auxpanel
The function resid_auxpanel
is included in the package to be used to create diagnostic plot panels similar to those from resid_panel
if working with a model type that is not currently supported by resid_panel
. The term “auxpanel” stands for “auxiliary panel” since this function is meant to provide additional support for users if they want to make use of the graphics in ggResidpanel but are not using a model of type “lm”, “glm”, “lmer”, “glmer”, “lmerTest”, or “lme”.
resid_auxpanel
works differently than the other functions in ggResidpanel. The other functions require a model to be input. resid_auxpanel
requires the residuals (resid
) and the predicted values (pred
) to be input. This allows users to extract the residuals and predicted values from any model type and create a panel of diagnostic plots. It could also be used if the user is working with a residual type not included in ggResidpanel.
The code below fits a random forest model to the trees
data. It specifies the volume as the response variable and the girth and height as the predictor variables. The predicted values are extracted from the model, and then the residuals are computed. These values are input into resid_auxpanel
. The plots
option is used to request a “resid” and a “index” plot, and the smoother
option is set to TRUE
. This produces a panel with residual and index plots.
# Fits a random forest model to the trees data
rf_model <- randomForest::randomForest(x = trees[,1:2], y = trees[,3])
# Obtains the predictions from the model on the observed data
rf_pred <- predict(rf_model, trees[,1:2])
# Obtains the residuals from the model
rf_resid <- trees[,3] - rf_pred
# Creates a panel with residual and index plots
resid_auxpanel(residuals = rf_resid,
predicted = rf_pred,
plots = c("resid", "index"),
smoother = TRUE)
resid_auxpanel
has a plots
option like the other functions in ggResidpanel. However, it does not include as many plot types. Since it uses the specified residuals and predicted values, it does not contain all of the information that a model would contain. Some of the plots available with other functions require additional information from the model. As a result, these plots are not available with resid_auxpanel
. The plots that are available are created in the same way and have the same plotting options as with the other functions.
The plotting options that are available with resid_auxpanel
are listed below. See the documentation under resid_panel
for details on the options.
plots = c("name of plot 1", "name of plot 2", ...))
resid_auxpanel
does not include the option of type
to specify the type of residual to use in the plot. Since the model is not input to resid_auxpanel
, it does not have the necessary information to be able to compute other types of residuals. If the user wishes to display a different type of residual, the user must compute these on their own.
resid_auxpanel
has the same formatting options as all of the functions in ggResidpanel. See the section on formatting options under the resid_panel
documentation for the details on how to use the formatting options, or click on a formatting option below to go directly to the section with the details for that option.
Chatterjee, S. & Hadi, A. S. Regression Analysis by Example. (Wiley, 2012).
Rawlings, J. O., Pantula, S. G. & Dickey, D. A. Applied Regression Analysis: A Research Tool. (Springer, 1998).
Seber, G. A. F. & Lee, A. J. Linear Regression Analysis, 2nd Edition. (John Wiley & Sons, 2003).