This journal contains information from the work I did to understand LIME. This includes a description of the LIME procedure from the original paper, a description of the functions in the in the lime R package, information about the Python lime package, and ideas for new binning methods.

# Load packages
library(assertthat)
library(cowplot)
library(furrr)
library(future)
library(glmnet)
library(gower)
library(lime)
library(randomForest)
library(tidyverse)

# Source functions
source("../../code/helper_functions.R")

Original Paper

Here, I have copied in the section of the original paper that describes LIME and put it in a more organized format. I have also added in some comments explaining my interpretation of the text and questions where I am not sure what is meant by the text.

Notation

Note: Most of the text in this section is copied from the original paper.

Notation	Original Paper Definition	My Interpretation (in terms of a data table)
$x\in\mathbb{R}^d$	Original representation of an instance being explained	Vector of observed features for one observation e.g. $(x_1, x_2,...,x_d)$ where $\textbf{X}$ is an $n$ by $d$ matrix
$f:\mathbb{R}^d→\mathbb{R}$	The model being explained…In classification, $f(x)$ is the probability (or a binary indicator) that $x$ belongs to a certain class.	This is a random forest in our case.
$x'\in\{0,1\}^{d'}$	A binary vector for interpretable representation (where interpretable explanations are defined as “use a representation that is understandable by humans”)	Varies based on the simulation method used. It is unclear to us how the size of $d$ compares to $d'$. We think they left it vague, so that it could fit any situation ($d<d'$, $d=d'$, or $d>d'$) (One example of converting to an interpretable representation would be a conversion to a vector of indicator variables that indicate whether the observed value falls in the same bin as the instance being explained e.g. $x'_i = I[x_i\in \mbox{same variable } i \mbox{ bin as the instance being explained}]$)
$G$	A class of potentially interpretable models, such as linear models, decision trees, or falling rule lists	I’m not sure what is considered $G$ in the R package - the fitting of the final ridge regression model or the ridge regression model used for feature selection?
$g$	An explanation as a model…can be readily presented to the user with visual or textual artifacts…The domain of $g$ is $\{0,1\}^{d′}$, i.e. $g$ acts over absence/presence of the interpretable components	This could be a ridge regression model with the form $f(x) \sim x'_1 + x'_2 + \cdots + x'_{d'}$
$\Omega(g)$	A measure of complexity of the explanation $g\in G$. For example, for decision trees $\Omega(g)$ may be the depth of the tree, while for linear models, $\Omega(g)$ may be the number of non-zero weights.	Seemingly, this is based on how the weights are assigned to the data points.
$\pi_x(z)$	A proximity measure between an instance $z$ to $x$, so as to define locality around $x$.	This could be the Gower distance.
$\mathcal{L}(f,g,\pi_x)$	A measure of how unfaithful $g$ is in approximating $f$ in the locality defined by $\pi_x$.	Does the LIME R package ever consider this?
$z'\in\{0,1\}^{d′}$	A perturbed sample…(which contains a fraction of the nonzero elements of $x′$)	It appears that they are creating the perturbations in a different way than the R package.
$z\in\mathbb{R}^d$	The sample in the original representation	Again, this seems to be different than the R package.
$f(z)$	Used as a label for the explanation model	This is the application of the complex model to the perturbed sample.
$\mathcal{Z}$	Data set…of perturbed samples with the associated labels	This is the full simulated dataset of perturbations and complex model predictions.
$\xi(x)$	The explanation produced by LIME…$\underset{g∈G}{\arg\min} \ \mathcal{L}(f,g,\pi_x) + \Omega(g)$	The output from the explanation model which is found through an optimization process.

Goal of LIME

In order to ensure both interpretability and local fidelity, we must minimize $\mathcal{L}(f,g,\pi_x)$ while having $\Omega(g)$ below enough to be interpretable by humans. The explanation produced by LIME is obtained by the following: \[\xi(x) = \underset{g∈G}{\arg\min} \ \mathcal{L}(f,g,\pi_x) + \Omega(g)\] This formulation can be used with different explanation families $G$, fidelity functions $\mathcal{L}$, and complexity measures $\Omega$. Here we focus on sparse linear models as explanations, and on performing the search using perturbations.

Procedure

We want to minimize the locality-aware loss $\mathcal{L}(f,g, \pi_x)$ without making any assumptions about $f$, since we want the explainer to be model-agnostic.

Step Number	Paper Description	My Interpretation
1	In order to learn the local behavior of $f$ as the interpretable inputs vary, we approximate $\mathcal{L}(f,g,\pi_x)$ by drawing samples, weighted by $\pi_x$.	I don’t understand how drawing these samples approximates $\mathcal{L}(f,g,\pi_x)$
2	We sample instances around $x'$ by drawing nonzero elements of $x'$ uniformly at random (where the number of such draws is also uniformly sampled).	What is meant by drawing elements of $x'$? Is this different than what the R package is doing? I’m wondering if the idea is that $x'=(x'_1, x'_2,...,x'_{d'})=(1, 1, ...., 1)$, and then a sample is taken such as $z'=(0, x'_2, 0, 0, x'_5,...,x'_{d'}) =(0, 1, 0, 0, 1, ..., 1)$. However, this doesn’t really make sense in terms of why $d'$ is different from $d$
3	Given a perturbed sample $z'\in\{0,1\}^{d′}$ (which contains a fraction of the nonzero elements of $x'$), we recover the sample in the original representation $z\in\mathbb{R}^d$ and obtain $f(z)$, which is used as a label for the explanation model.	How is the original sample recovered? Maybe based on my example in the last step $z=(0, x_2, 0, 0, x_5, ..., x_d)$
4	Given this data set $\mathcal{Z}$ of perturbed samples with the associated labels, we optimize $\mathcal{L}(f,g,\pi_x) + \Omega(g)$ to get an explanation $\xi(x)$.	Where is this done in the R package?

It is worth noting that our method is fairly robust to sampling noise since the samples are weighted by $\pi_x$. Okay - but this will really depend on how $\pi_x$ is chosen

…let $G$ be the class of linear models, such that $g(z′)=w_g\cdot z′$. (What is $w_g$? Is this suppose to be a linear model where $w_g$ are the coefficients?) We use the locally weighted square loss as $\mathcal{L}$…where we let \[\pi_x(z)=exp(−D(x,z)^2/\sigma^2)\] be an exponential kernel defined on some distance function $D$ (e.g. cosine distance for text, $L2$ distance or images) with width $\sigma$. \[\mathcal{L}(f,g,\pi_x)=\sum_{z, z'\in\mathcal{Z}}\pi_x(z)\left(f(z)-g(z')\right)^2\]

Comment on Faithfulness

…our choice of $G$ (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation. However, we can estimate the faithfulness of the explanation on $\mathcal{Z}$, and present this information to the user. This estimate of faithfulness can also be used for selecting an appropriate family of explanations from a set of multiple interpretable model classes, thus adapting to the given dataset and the classifier. We leave such exploration for future work, as linear explanations work quite well for multiple black-box models in our experiments.

R Package

This section describes the procedure the the lime R package implements. The version of the lime package that is considered here is 0.5.1. The version considered when I first wrote this section was 0.5.0. I am only considering the data.frame versions of the functions. In order to determine if any of the functions that I am interested in have changed since I wrote the description, I created the function below that allows me to check if anything has changed in the function. If nothing has changed, it then prints the function.

# Function for checking for changes in a function and printing the current version
func_check <- function(func_name, version){
  
  # Specify the path to where the function will be stored
  file_path <- paste0("../../../data/lime_functions/", func_name, "_v", version, ".rds")
  
  # Save the file if it does not exist, otherwise load the function and check 
  # to make sure they have been no changes
  if(!file.exists(file_path)) {
    saveRDS(object = getAnywhere(func_name), file = file_path)
  } else {
    saved_func <- readRDS(file_path)
    assertthat::assert_that(assertthat::are_equal(saved_func,
                                                  getAnywhere(func_name)))
  }
  
  # Print the function
  getAnywhere(func_name)

}

In order to run the lime functions, here I load the bullet data to apply the code to.

# Load in the bullet training and testing data
hamby173and252_train <- read.csv("../../../data/hamby173and252_train.csv")
hamby224_test <- read.csv("../../../data/hamby224_test.csv")

# Obtain features used when fitting the rtrees random forest
rf_features <- rownames(bulletr::rtrees$importance)

Main Functions

`lime.data.frame`

Inputs

x: The training data used to fit the model to be explained
model: The model to explain.
preprocess = NULL: Function to transform a character vector to the format expected from the model.
bin_continuous = TRUE: If set to TRUE, the continuous variables will be binned when making the explanations. If they are not binned, then perturbations will be obtained by either simulating using kernel density estimation or a normal distribution depending on what the option of use_density is set to.
n_bins = 4: The number of bins to use if bin_continuous = TRUE.
quantile_bins = TRUE: Should quantile bins be used if `bin_continuous = TRUE? (Otherwise, equally spaced bins will be used.)
use_density = TRUE: This option is only considered if bin_continuous is set to FALSE. In that situation, if use_density = TRUE, then the continuous data will be sampled using kernel density estimation. Otherwise, it will be assumed that the continuous features follow a normal distribution and samples will be drawn from a normal distribution with the mean and standard deviation set to the sample mean and standard deviation associated with the feature.

Procedure

If preprocess is NULL, then set it to the identity function. Either way, check to make sure that it is a function.
```
 if (is.null(preprocess)) 
   preprocess <- function(x) x
 assert_that(is.function(preprocess))
```

Create an explainer object.

 explainer <- c(as.list(environment()), list(...))
 explainer$x <- NULL

Determine the type of each of the features (e.g. integer, numeric, etc.). If any of the variables are constants, a warning is returned. If the feature type is unknown, an error is returned.

 explainer$feature_type <- setNames(sapply(x, function(f) {
   if (is.integer(f)) {
     if (length(unique(f)) == 1) 
          "constant"
      else "integer"
   }
   else if (is.numeric(f)) {
     if (length(unique(f)) == 1) 
          "constant"
     else "numeric"
   }
   else if (is.character(f)) {
     "character"
   }
   else if (is.factor(f)) {
     "factor"
   }
   else if (is.logical(f)) {
     "logical"
   }
   else if (inherits(f, "Date") || inherits(f, "POSIXt")) {
     "date_time"
   }
   else {
     stop("Unknown feature type", call. = FALSE)
   }
 }), names(x))
 if (any(explainer$feature_type == "constant")) {
   warning("Data contains numeric columns with zero variance", 
     call. = FALSE)
 }

Create bins for numeric and inter type variables. If quantile_bins = TRUE, then n_bins bins are created using quantiles. Otherwise, n_bins equally spaced bins are created. If quantile_bins = TRUE and n_bins < 3, then a warning is produced (“does not contain enough variance to use quantile binning. Using standard binning instead.”) and n_bins equally spaced bins are returned.

 explainer$bin_cuts <- setNames(lapply(seq_along(x), function(i) {
   if (explainer$feature_type[i] %in% c("numeric", "integer")) {
     if (quantile_bins) {
       bins <- quantile(x[[i]], seq(0, 1, length.out = n_bins + 
         1), na.rm = TRUE)
       bins <- bins[!duplicated(bins)]
       if (length(bins) < 3) {
         warning(names(x)[i], " does not contain enough variance to use
         quantile binning. Using standard binning instead.", 
           call. = FALSE)
         d_range <- range(x[[i]], na.rm = TRUE)
         bins <- seq(d_range[1], d_range[2], length.out = n_bins + 1)
       }
       bins
     }
     else {
       d_range <- range(x[[i]], na.rm = TRUE)
       seq(d_range[1], d_range[2], length.out = n_bins + 1)
     }
   }
 }), names(x))

Determine the “distribution” for each feature. The method used is based on the type of variable.

Integer: nothing
Numeric: If bin_continuous, then determine the proportion of observations in each bin. If use_density=TRUE, then kernel a kernel density approximation is used to estimate the distribution. Otherwise, the mean and standard deviation of each of the features is computed.
Character: nothing
Logical: nothing

Factor: Determine the proportion in each category.

explainer$feature_distribution <- setNames(lapply(seq_along(x),
    function(i) {
      switch(explainer$feature_type[i], 
        integer = , 
        numeric = if (bin_continuous) {
          table(cut(x[[i]], unique(explainer$bin_cuts[[i]]), 
            labels = FALSE, include.lowest = TRUE))/nrow(x)
        } else if (use_density) {
          density(x[[i]])
        } else {
         c(mean = mean(x[[i]], na.rm = TRUE), sd = sd(x[[i]], 
           na.rm = TRUE))
        }, character = , logical = , factor = table(x[[i]])/nrow(x), 
         NA)
  }), names(x))

Assign some attributes to the explainer object, and return the explainer object.

 structure(explainer, class = c("data_frame_explainer", "explainer", 
   "list"))

Function

# Print the function (and check to see if there have been any changes)
func_check("lime.data.frame", "0.5.0")

## A single object matching 'lime.data.frame' was found
## It was found in the following places
##   registered S3 method for lime from namespace lime
##   namespace:lime
## with value
## 
## function (x, model, preprocess = NULL, bin_continuous = TRUE, 
##     n_bins = 4, quantile_bins = TRUE, use_density = TRUE, ...) 
## {
##     if (is.null(preprocess)) 
##         preprocess <- function(x) x
##     assert_that(is.function(preprocess))
##     explainer <- c(as.list(environment()), list(...))
##     explainer$x <- NULL
##     explainer$feature_type <- setNames(sapply(x, function(f) {
##         if (is.integer(f)) {
##             if (length(unique(f)) == 1) 
##                 "constant"
##             else "integer"
##         }
##         else if (is.numeric(f)) {
##             if (length(unique(f)) == 1) 
##                 "constant"
##             else "numeric"
##         }
##         else if (is.character(f)) {
##             "character"
##         }
##         else if (is.factor(f)) {
##             "factor"
##         }
##         else if (is.logical(f)) {
##             "logical"
##         }
##         else if (inherits(f, "Date") || inherits(f, "POSIXt")) {
##             "date_time"
##         }
##         else {
##             stop("Unknown feature type", call. = FALSE)
##         }
##     }), names(x))
##     if (any(explainer$feature_type == "constant")) {
##         warning("Data contains numeric columns with zero variance", 
##             call. = FALSE)
##     }
##     explainer$bin_cuts <- setNames(lapply(seq_along(x), function(i) {
##         if (explainer$feature_type[i] %in% c("numeric", "integer")) {
##             if (quantile_bins) {
##                 bins <- quantile(x[[i]], seq(0, 1, length.out = n_bins + 
##                   1), na.rm = TRUE)
##                 bins <- bins[!duplicated(bins)]
##                 if (length(bins) < 3) {
##                   warning(names(x)[i], " does not contain enough variance to use quantile binning. Using standard binning instead.", 
##                     call. = FALSE)
##                   d_range <- range(x[[i]], na.rm = TRUE)
##                   bins <- seq(d_range[1], d_range[2], length.out = n_bins + 
##                     1)
##                 }
##                 bins
##             }
##             else {
##                 d_range <- range(x[[i]], na.rm = TRUE)
##                 seq(d_range[1], d_range[2], length.out = n_bins + 
##                   1)
##             }
##         }
##     }), names(x))
##     explainer$feature_distribution <- setNames(lapply(seq_along(x), 
##         function(i) {
##             switch(explainer$feature_type[i], integer = , numeric = if (bin_continuous) {
##                 table(cut(x[[i]], unique(explainer$bin_cuts[[i]]), 
##                   labels = FALSE, include.lowest = TRUE))/nrow(x)
##             } else if (use_density) {
##                 density(x[[i]])
##             } else {
##                 c(mean = mean(x[[i]], na.rm = TRUE), sd = sd(x[[i]], 
##                   na.rm = TRUE))
##             }, character = , logical = , factor = table(x[[i]])/nrow(x), 
##                 NA)
##         }), names(x))
##     structure(explainer, class = c("data_frame_explainer", "explainer", 
##         "list"))
## }
## <bytecode: 0x7f863cfa6ab0>
## <environment: namespace:lime>

`explain.data.frame`

Inputs

x: The new observations to explain (with the same format as used when creating the explainer).
explainer: This is the object output from the lime function.
labels = NULL: The specific labels (classes) to explain in case the model is a classifier. For classifiers either this or n_labels must be given.
n_labels = NULL: The number of labels to explain. If this is given for classifiers, the top n_label classes will be explained.
n_features: The number of features to use for each explanation. (i.e. The number of features to select during feature selection.)
n_permutations = 5000: The number of perturbations generated for each feature.
feature_select = 'auto': This is the feature selection method for choosing the number of features specified.
- auto: If n_features $\le 6$, uses forward_selection. Otherwise, highest_weights is used.
- none: Use all features for the explanation. Not advised unless you have very few features.
- forward_selection: Add one feature at a time until n_features is reached, based on quality of a ridge regression model.
- highest_weights: The n_features with highest absolute weight in a ridge regression fit of the complex model outcome are chosen.
- lasso_path: Fit a lasso model and choose the n_features whose lars path converge to zero the latest.
- tree: Fit a tree to select n_features (which needs to be a power of 2). It requires last version of XGBoost.
dist_fun = 'gower': The distance function to use for calculating the distance from the observation to the permutations. If dist_fun = 'gower' (default) it will use gower::gower_dist(). Otherwise it will be forwarded to stats::dist().
kernel_width = NULL:The width of the exponential kernel that will be used to convert the distance to a similarity in case dist_fun != 'gower'.
gower_pow = 1: A modifier for gower distance. The calculated distance will be raised to the power of this value.

Procedure

Check to make sure that explainer is an object output from the lime function.
```
 assertthat::assert_that(is.data_frame_explainer(explainer))
```
Determine the type of the model that is contained within the explainer function (using the model_type function), and determine the type of output that should be produced by explain based on the model type (using the output_type function).
```
 m_type <- model_type(explainer)  
 o_type <- output_type(explainer)
```
If the model type is regression, check to make sure both labels and n_labels are set to NULL. Otherwise, return a warning saying that labels and n_labels are ignored with explaining regression models. Then set n_lables to 1 and labels to NULL.
```
 if (m_type == "regression") {
   if (!is.null(labels) || !is.null(n_labels)) {
     warning("\"labels\" and \"n_labels\" arguments are ignored 
       when explaining regression models")
   }
   n_labels <- 1
   labels <- NULL
 }
```
Check to make sure that only labels or n_labels is specified and the other is NULL. If not, output a warning saying that you must choose between one or the other.
```
 assert_that(is.null(labels) + is.null(n_labels) == 1, 
   msg = "You need to choose between labels and n_labels parameters.")
```

Return an error if n_features or n_permutations are not specified as counts.

 assert_that(is.count(n_features))
 assert_that(is.count(n_permutations))

If the kernel width has not been specified, then compute it as the square root of the number of columns of the testing dataframe multiplied by 0.75. Then create a kernel function that will be used to weight the perturbations based on the specified kernel width using the function exp_kernel. The formula used by exp_kernel is \[f(x, w)=\sqrt{\exp\left(\frac{-(x^2)}{w}\right)}\] where $w$ is the kernel width.
```
 if (is.null(kernel_width)) {
   kernel_width <- sqrt(ncol(x)) * 0.75
 }
 kernel <- exp_kernel(kernel_width)
```
Makes use of the permute_cases function for data frames in the lime package. This is the function that creates the perturbations. One perturbation is created for each of the n_permuations specified.
```
 case_perm <- permute_cases(x, n_permutations, explainer$feature_distribution, 
   explainer$bin_continuous, explainer$bin_cuts, explainer$use_density)
```

Obtains predictions for all of the permutated data points based on the original model using the predict_model function. Use the function set_labels

 case_res <- predict_model(explainer$model, 
   explainer$preprocess(case_perm),
   type = o_type, ...)
 case_res <- set_labels(case_res, explainer$model)

Creates indicies for each of the perturbations and dividing them into groups based on which new prediction case they correspond to
```
 case_ind <- split(seq_len(nrow(case_perm)), rep(seq_len(nrow(x)), each = n_permutations))
```

Fill in the results obejct

res <- lapply(seq_along(case_ind), function(ind) {

Create a vector of indicies for one of the observations
```
i <- case_ind[[ind]]
```
If the distance function is specified to be “gower”, then compute the distance between the case of interest and each observation in the permuted dataset. These distances get raised to whatever the specified gower_pow is. Then these values are subtracted from one so that observations closer to the case of interest have a higher “weight” associated with them.
```
if (dist_fun == "gower") {
  sim <- 1 - (gower_dist(case_perm[i[1], , drop = FALSE], 
    case_perm[i, , drop = FALSE]))^gower_pow
}
```
Turns the factor and character variables into numeric variables and determines whether a permutation falls into the test data bin if bins_continuous is set to TRUE. This uses the numerify function.
```
perms <- numerify(case_perm[i, ], explainer$feature_type, explainer$bin_continuous, explainer$bin_cuts)
```

If the distance is not set to “gower”, then distance between the case of interest and the permutations is computed using the kernel function. This also makes use of the feature_scale function.

if (dist_fun != "gower") {
  sim <- kernel(c(0, dist(feature_scale(perms, explainer$feature_distribution, 
    explainer$feature_type, explainer$bin_continuous), 
    method = dist_fun)[seq_len(n_permutations - 1)]))
}

Perform feature selection and fit a model to the permutations using these important variables using the function model_permutations.
```
res <- model_permutations(as.matrix(perms), case_res[i, , drop = FALSE], sim, labels, n_labels, n_features, feature_select)
```

Finish filling in the rest of the results object with information about bins and such. This makes use of the describe_feature function.

res$feature_value <- unlist(case_perm[i[1], res$feature])
res$feature_desc <- describe_feature(res$feature, case_perm[i[1], ], explainer$feature_type, 
  explainer$bin_continuous, explainer$bin_cuts)
guess <- which.max(abs(case_res[i[1], ]))
res$case <- rownames(x)[ind]
res$label_prob <- unname(as.matrix(case_res[i[1], ]))[match(res$label, colnames(case_res))]
res$data <- list(as.list(case_perm[i[1], ]))
res$prediction <- list(as.list(case_res[i[1], ]))
res$model_type <- m_type
res

Join the results and assign names to the columns

res <- do.call(rbind, res)
res <- res[, c('model_type', 'case', 'label', 'label_prob', 'model_r2', 
               'model_intercept', 'model_prediction', 'feature', 'feature_value', 
               'feature_weight', 'feature_desc', 'data', 'prediction')]
if (m_type == 'regression') {
  res$label <- NULL
  res$label_prob <- NULL
  res$prediction <- unlist(res$prediction)
}

Return the results object
```
res
```

Function

# Print the function (and check to see if there have been any changes)
func_check("explain.data.frame", "0.5.0")

## A single object matching 'explain.data.frame' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, explainer, labels = NULL, n_labels = NULL, n_features, 
##     n_permutations = 5000, feature_select = "auto", dist_fun = "gower", 
##     kernel_width = NULL, gower_pow = 1, ...) 
## {
##     assert_that(is.data_frame_explainer(explainer))
##     m_type <- model_type(explainer)
##     o_type <- output_type(explainer)
##     if (m_type == "regression") {
##         if (!is.null(labels) || !is.null(n_labels)) {
##             warning("\"labels\" and \"n_labels\" arguments are ignored when explaining regression models")
##         }
##         n_labels <- 1
##         labels <- NULL
##     }
##     assert_that(is.null(labels) + is.null(n_labels) == 1, msg = "You need to choose between labels and n_labels parameters.")
##     assert_that(is.count(n_features))
##     assert_that(is.count(n_permutations))
##     if (is.null(kernel_width)) {
##         kernel_width <- sqrt(ncol(x)) * 0.75
##     }
##     kernel <- exp_kernel(kernel_width)
##     case_perm <- permute_cases(x, n_permutations, explainer$feature_distribution, 
##         explainer$bin_continuous, explainer$bin_cuts, explainer$use_density)
##     case_res <- predict_model(explainer$model, explainer$preprocess(case_perm), 
##         type = o_type, ...)
##     case_res <- set_labels(case_res, explainer$model)
##     case_ind <- split(seq_len(nrow(case_perm)), rep(seq_len(nrow(x)), 
##         each = n_permutations))
##     res <- lapply(seq_along(case_ind), function(ind) {
##         i <- case_ind[[ind]]
##         if (dist_fun == "gower") {
##             sim <- 1 - (gower_dist(case_perm[i[1], , drop = FALSE], 
##                 case_perm[i, , drop = FALSE]))^gower_pow
##         }
##         perms <- numerify(case_perm[i, ], explainer$feature_type, 
##             explainer$bin_continuous, explainer$bin_cuts)
##         if (dist_fun != "gower") {
##             sim <- kernel(c(0, dist(feature_scale(perms, explainer$feature_distribution, 
##                 explainer$feature_type, explainer$bin_continuous), 
##                 method = dist_fun)[seq_len(n_permutations - 1)]))
##         }
##         res <- model_permutations(as.matrix(perms), case_res[i, 
##             , drop = FALSE], sim, labels, n_labels, n_features, 
##             feature_select)
##         res$feature_value <- unlist(case_perm[i[1], res$feature])
##         res$feature_desc <- describe_feature(res$feature, case_perm[i[1], 
##             ], explainer$feature_type, explainer$bin_continuous, 
##             explainer$bin_cuts)
##         guess <- which.max(abs(case_res[i[1], ]))
##         res$case <- rownames(x)[ind]
##         res$label_prob <- unname(as.matrix(case_res[i[1], ]))[match(res$label, 
##             colnames(case_res))]
##         res$data <- list(as.list(case_perm[i[1], ]))
##         res$prediction <- list(as.list(case_res[i[1], ]))
##         res$model_type <- m_type
##         res
##     })
##     res <- do.call(rbind, res)
##     res <- res[, c("model_type", "case", "label", "label_prob", 
##         "model_r2", "model_intercept", "model_prediction", "feature", 
##         "feature_value", "feature_weight", "feature_desc", "data", 
##         "prediction")]
##     if (m_type == "regression") {
##         res$label <- NULL
##         res$label_prob <- NULL
##         res$prediction <- unlist(res$prediction)
##     }
##     as_tibble(res)
## }
## <bytecode: 0x7f861f92fa50>
## <environment: namespace:lime>

Implementation

library(randomForest)
x <- hamby224_test %>% 
  arrange(case) %>%
  select(rf_features) %>% 
  na.omit()
explainer <- lime:::lime.data.frame(x = hamby173and252_train %>%
                                      select(rf_features), 
                                    y = hamby173and252_train %>%
                                      select(samesource), 
                                    model = as_classifier(bulletr::rtrees))
label = TRUE
n_labels = NULL
n_features = 3
n_permutations = 5000
feature_select = 'auto'
dist_fun = "gower"
kernel_width = NULL
gower_pow = 1

assertthat::assert_that(lime:::is.data_frame_explainer(explainer))

## [1] TRUE

#m_type <- lime:::model_type(explainer)
o_type <- lime:::output_type(explainer)

assert_that(is.null(labels) + is.null(n_labels) == 1, 
   msg = "You need to choose between labels and n_labels parameters.")

## [1] TRUE

assert_that(is.count(n_features))

## [1] TRUE

assert_that(is.count(n_permutations))

## [1] TRUE

if (is.null(kernel_width)) {
   kernel_width <- sqrt(ncol(x)) * 0.75
}
kernel <- lime:::exp_kernel(kernel_width)

case_perm <- lime:::permute_cases.data.frame(x,
                                             n_permutations,
                                             explainer$feature_distribution,
                                             explainer$bin_continuous,
                                             explainer$bin_cuts,
                                             explainer$use_density)
head(case_perm)

##          ccf  rough_cor           D        sd_D    matches mismatches
## 1 0.27042215  0.2704221 0.001396367 0.002090586  0.5901742   7.616146
## 2 0.28724882 -0.1611515 1.013499988 3.019210854  2.9464410   9.556052
## 3 0.30790540  0.4078556 2.811572048 2.010342687  1.2441031   9.978211
## 4 0.04436213  0.9178336 2.110390058 6.498977479  1.6576612  25.502945
## 5 0.27896222 -0.1015974 3.398931097 1.996186637  3.0527158   1.367676
## 6 0.25113254 -0.1527809 4.492328696 4.553291926 16.8560805  32.680021
##          cms   non_cms sum_peaks
## 1 0.59017419  3.808073  1.463209
## 2 7.52621104  4.334388 20.818953
## 3 1.32867089 32.760953  3.568589
## 4 0.85718517  5.956968 23.968347
## 5 0.18317852 18.112314 12.565722
## 6 0.07182718  1.122142  4.000728

case_res <- lime:::predict_model(explainer$model, 
                                 explainer$preprocess(case_perm),
                                 type = o_type)
case_res <- lime:::set_labels(case_res, explainer$model)
head(case_res)

##       FALSE      TRUE
## 1 1.0000000 0.0000000
## 2 0.8066667 0.1933333
## 3 0.6633333 0.3366667
## 4 0.7766667 0.2233333
## 5 0.6833333 0.3166667
## 6 0.4833333 0.5166667

case_ind <- split(seq_len(nrow(case_perm)), rep(seq_len(nrow(x)), each = n_permutations))
case_ind[[1]][1:10]

##  [1]  1  2  3  4  5  6  7  8  9 10

case_ind[[2]][1:10]

##  [1] 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010

case_ind[[3]][1:10]

##  [1] 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010

str(seq_along(case_ind))

##  int [1:364] 1 2 3 4 5 6 7 8 9 10 ...

i <- case_ind[[1]]
str(i)

##  int [1:5000] 1 2 3 4 5 6 7 8 9 10 ...

if (dist_fun == "gower") {
  sim <- 1 - (gower_dist(case_perm[i[1], , drop = FALSE], 
    case_perm[i, , drop = FALSE]))^gower_pow
}
str(sim)

##  num [1:5000] 1 0.765 0.82 0.653 0.795 ...

perms <- lime:::numerify(case_perm[i, ], 
                         explainer$feature_type, 
                         explainer$bin_continuous,
                         explainer$bin_cuts)
head(perms)

##   ccf rough_cor D sd_D matches mismatches cms non_cms sum_peaks
## 1   1         1 1    1       1          1   1       1         1
## 2   0         0 1    0       0          0   0       1         0
## 3   0         1 0    1       0          0   0       0         0
## 4   0         1 0    0       0          0   0       0         0
## 5   0         0 0    1       0          1   1       0         0
## 6   1         0 0    0       0          0   1       0         0

# error?
# res <- lime:::model_permutations(as.matrix(perms), 
#                                  case_res[i, , drop = FALSE], 
#                                  sim, 
#                                  labels, 
#                                  n_labels, 
#                                  n_features, 
#                                  feature_select)

Helper Functions

These are listed in the order that they are used.

`model_type`

Inputs

x: the model that is stored in the explainer object

Procedure

The function model_type is an overarching function that leads to the specific functions for determining the model type.

 model_type <- function (x, ...) {
             UseMethod("model_type")
           }

The specific functions vary based on what package was used to fit the model. Here is information from the package documentation about models supported by lime.

 Out of the box, lime supports the following model objects:
     - train from caret
     - WrappedModel from mlr
     - xgb.Booster from xgboost
     - H2OModel from h2o
     - keras.engine.training.Model from keras
     - lda from MASS (used for low-dependency examples)
 If your model is not one of the above you'll need to implement support yourself. If the model has a predict interface mimicking that of predict.train() from caret, it will be enough to wrap your model in as_classifier()/as_regressor() to gain support. Otherwise you'll need need to implement a predict_model() method and potentially a model_type() method (if the latter is omitted the model should be wrapped in as_classifier()/as_regressor(), everytime it is used in lime()).

Apply the model specific function to determine the model type.

default

stop("The class of model must have a model_type method. See ?model_type to get an overview of models supported out of the box", call. = FALSE)

lime classifier
```
  "classification"
```
lime regressor
```
  "regression"
```
train
```
  tolower(x$modelType)
```
model fit
```
  x$spec$mode
```

Wrapped Model

  switch(x$learner$type, classif = "classification", regr = "regression", 
    surv = "survival", cluster = "clustering", multilabel = "multilabel")

xgb Booster

  obj <- x$params$objective
  if (is.null(obj)) 
    return("regression")
  if (is.function(obj)) 
    stop("Unsupported model type", call. = FALSE)
  type <- strsplit(obj, ":")[[1]][1]
  switch(type, reg = "regression", binary = "classification", 
    multi = "classification", stop("Unsupported model type", 
    call. = FALSE))

lda
```
  "classification"
```

keras

  if (!requireNamespace("keras", quietly = TRUE)) {
    stop("The keras package is required for predicting keras models")
  }
  num_layers <- length(x$layers)
  if (keras::get_config(keras::get_layer(x, index = num_layers))$activation == "linear") {
    "regression"
  }
  else {
    "classification"
  }

ranger

  ranger_model_class <- x$treetype
  if (ranger_model_class == "Probability estimation" || ranger_model_class == "Classification") {
    return("classification")
  }
  else if (ranger_model_class == "Regression") {
    return("regression")
  }
  else {
    stop(paste0("ranger model class \"", ranger_model_class, 
      "\" is not currently supported."))
  }

Function

func_check("model_type", "0.5.0")

## A single object matching 'model_type' was found
## It was found in the following places
##   package:lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     UseMethod("model_type")
## }
## <bytecode: 0x7f863adb8200>
## <environment: namespace:lime>

func_check("model_type", "0.5.0")

## A single object matching 'model_type' was found
## It was found in the following places
##   package:lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     UseMethod("model_type")
## }
## <bytecode: 0x7f863adb8200>
## <environment: namespace:lime>

func_check("model_type.default", "0.5.0")

## A single object matching 'model_type.default' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     stop("The class of model must have a model_type method. See ?model_type to get an overview of models supported out of the box", 
##         call. = FALSE)
## }
## <bytecode: 0x7f863a9cf498>
## <environment: namespace:lime>

func_check("model_type.lime_classifier", "0.5.0")

## A single object matching 'model_type.lime_classifier' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## "classification"
## <bytecode: 0x7f861f484d38>
## <environment: namespace:lime>

func_check("model_type.lime_regressor", "0.5.0")

## A single object matching 'model_type.lime_regressor' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## "regression"
## <bytecode: 0x7f863a4d3cb0>
## <environment: namespace:lime>

func_check("model_type.train", "0.5.0")

## A single object matching 'model_type.train' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     tolower(x$modelType)
## }
## <bytecode: 0x7f863a321dc8>
## <environment: namespace:lime>

func_check("model_type.model_fit", "0.5.0")

## A single object matching 'model_type.model_fit' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     x$spec$mode
## }
## <bytecode: 0x7f863a157930>
## <environment: namespace:lime>

func_check("model_type.WrappedModel", "0.5.0")

## A single object matching 'model_type.WrappedModel' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     switch(x$learner$type, classif = "classification", regr = "regression", 
##         surv = "survival", cluster = "clustering", multilabel = "multilabel")
## }
## <bytecode: 0x7f8639fc31c0>
## <environment: namespace:lime>

func_check("model_type.xgb.Booster", "0.5.0")

## A single object matching 'model_type.xgb.Booster' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     obj <- x$params$objective
##     if (is.null(obj)) 
##         return("regression")
##     if (is.function(obj)) 
##         stop("Unsupported model type", call. = FALSE)
##     type <- strsplit(obj, ":")[[1]][1]
##     switch(type, reg = "regression", binary = "classification", 
##         multi = "classification", stop("Unsupported model type", 
##             call. = FALSE))
## }
## <bytecode: 0x7f8639d990e0>
## <environment: namespace:lime>

func_check("model_type.lda", "0.5.0")

## A single object matching 'model_type.lda' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## "classification"
## <bytecode: 0x7f8639b7a318>
## <environment: namespace:lime>

func_check("model_type.keras.engine.training.Model", "0.5.0")

## A single object matching 'model_type.keras.engine.training.Model' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     if (!requireNamespace("keras", quietly = TRUE)) {
##         stop("The keras package is required for predicting keras models")
##     }
##     num_layers <- length(x$layers)
##     if (keras::get_config(keras::get_layer(x, index = num_layers))$activation == 
##         "linear") {
##         "regression"
##     }
##     else {
##         "classification"
##     }
## }
## <bytecode: 0x7f8639a28ba0>
## <environment: namespace:lime>

func_check("model_type.H2OModel", "0.5.0")

## A single object matching 'model_type.H2OModel' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     h2o_model_class <- class(x)[[1]]
##     if (h2o_model_class %in% c("H2OBinomialModel", "H2OMultinomialModel")) {
##         return("classification")
##     }
##     else if (h2o_model_class == "H2ORegressionModel") {
##         return("regression")
##     }
##     else {
##         stop("This h2o model is not currently supported.")
##     }
## }
## <bytecode: 0x7f8639825b98>
## <environment: namespace:lime>

func_check("model_type.ranger", "0.5.0")

## A single object matching 'model_type.ranger' was found
## It was found in the following places
##   registered S3 method for model_type from namespace lime
##   namespace:lime
## with value
## 
## function (x, ...) 
## {
##     ranger_model_class <- x$treetype
##     if (ranger_model_class == "Probability estimation" || ranger_model_class == 
##         "Classification") {
##         return("classification")
##     }
##     else if (ranger_model_class == "Regression") {
##         return("regression")
##     }
##     else {
##         stop(paste0("ranger model class \"", ranger_model_class, 
##             "\" is not currently supported."))
##     }
## }
## <bytecode: 0x7f8638544f58>
## <environment: namespace:lime>

`output_type`

Inputs

x: the model that is stored in the explainer object

Procedure

Identifies which model type the model is. If it is a classification model, then the output type is set to “prob”. If it is a regression model, then the output type is set to “raw”. If neither of these, then an error is returned saying that the model type that was input is not supported by the package.
```
 switch(model_type(x), 
       classification = "prob", 
       regression = "raw", 
       stop(model_type(x), " models are not supported yet", call. = FALSE))
```

Function

# Print the function (and check to see if there have been any changes)
func_check("output_type", "0.5.0")

## A single object matching 'output_type' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x) 
## {
##     switch(model_type(x), classification = "prob", regression = "raw", 
##         stop(model_type(x), " models are not supported yet", 
##             call. = FALSE))
## }
## <bytecode: 0x7f861f4868a0>
## <environment: namespace:lime>

`exp_kernel`

Inputs

width: kernel width

Procedure

Create a function for computing an exponential kernel.
```
 function(x) sqrt(exp(-(x^2)/(width^2))) 
```

Function

# Print the function (and check to see if there have been any changes)
func_check("exp_kernel", "0.5.0")

## A single object matching 'exp_kernel' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (width) 
## {
##     function(x) sqrt(exp(-(x^2)/(width^2)))
## }
## <bytecode: 0x7f863b2b9ce8>
## <environment: namespace:lime>

`permute_cases`

Inputs

cases: the object x from explain.data.frame (new obs to explain)
n_permutations: the object n_permutations from explain.data.frame (number of perturbations)
feature_distribution: the object explainer$feature_distribution (the distribution of each feature computed from lime.data.frame)
bin_continuous: the object explainer$bin_continuous determined from lime.data.frame
bin_cuts: the object explainer$bin_cuts determined from lime.data.frame
use_density: the object explainer$use_density determined from lime.data.frame

Procedure

Compute the number of rows as the number of predictions to be explained times the number of permutations.
```
 nrows <- nrow(cases) * n_permutations
```

Create the perturbations based on the type of variable and using the feature distribution determined by the lime function. The cases are as follows. The function does this process one variable at a time where a case represents on variable.

 perm <- as.data.frame(lapply(seq_along(cases), function(i) {

numeric variable and feature distribution contains all NAs: the observed cases get repeated n_permutations times

  perms <- if (is.numeric(cases[[i]]) && identical(is.na(feature_distribution[[i]]), TRUE)) {
  rep(cases[[i]], each = n_permutations)
  }

numeric variable and bin_continuous = TRUE: sample from 1:number of bin cuts with replacement for nrows times using the probability from the variable distribution sample from a uniform(0,1) distribution and then multiply by the difference in bin cuts of the appropriate bin and add the quantile associated with the bin
```
  else if (is.numeric(cases[[i]]) && bin_continuous) {
    bin <- sample(seq_along(feature_distribution[[i]]), 
      nrows, TRUE, as.numeric(feature_distribution[[i]]))
    diff(bin_cuts[[i]])[bin] * runif(nrows) + bin_cuts[[i]][bin]
  }
```

numeric variable, bin_continuous = FALSE, and use_density = TRUE:

  else if (is.numeric(cases[[i]]) && use_density) {
    width <- diff(feature_distribution[[i]]$x[c(1, 2)])/2
    sample(feature_distribution[[i]]$x, nrows, TRUE, 
      feature_distribution[[i]]$y) + runif(nrows, -width, width)
  }

numeric variable, bin_continuous = FALSE, and use_density = FALSE: randomly sample from a normal distribution, multiply by the sd of the feature distribution, and add the mean
```
  else if (is.numeric(cases[[i]])) {
     rnorm(nrows) * feature_distribution[[i]]["sd"] + 
         feature_distribution[[i]]["mean"]
  }    
```

character variable: sample from the names of the levels in the variable based on the probability of the variable distribution

  else if (is.character(cases[[i]])) {
     sample(names(feature_distribution[[i]]), nrows, TRUE, 
         as.numeric(feature_distribution[[i]]))
  }

logical variable:

  else if (is.logical(cases[[i]])) {
     sample(as.logical(names(feature_distribution[[i]])), 
         nrows, TRUE, as.numeric(feature_distribution[[i]]))
  }

factor variable: sample from the names of the levels in the variable based on the probability of the variable distribution and convert the samples to factors

  else if (is.factor(cases[[i]])) {
    x <- sample(names(feature_distribution[[i]]), nrows, 
      TRUE, as.numeric(feature_distribution[[i]]))
    factor(x, levels = names(feature_distribution[[i]]))
  }

date:

  else if (inherits(cases[[i]], "Date") || inherits(cases[[i]], "POSIXt")) {
    rep(cases[[i]], each = n_permutations)
  }

If the variable in the original dataset is an integer, then convert the perturbations to integers.
```
 if (is.integer(cases[[i]])) {
   as.integer(round(perms))
 } else {
   perms
 }
```
Give the permutations the same name as the variables.
```
 names(perm) <- names(cases)
```
Add the cases back into the data frame of the permutations, so that the case of interest gets inserted at the beginning of each set of perturbations.
```
 perm[seq.int(1, by = n_permutations, length.out = nrow(cases)), ] <- cases
```
Return the permutations.
```
 perm
```

The final perm will look like…

Case	Obs
1	1st case of interest
1	perturbation 1,1
1	perturbation 1,2
…	…
1	perturbation 1,m
2	2nd case of interest
2	perturbation 2,1
2	perturbation 2,2
…	…
2	perturbation 2,m
…	…
n	nth case of interest
n	perturbation n,1
n	perturbation n,2
…	…
n	perturbation n,m

Function

# Print the function (and check to see if there have been any changes)
func_check("permute_cases.data.frame", "0.5.0")

## A single object matching 'permute_cases.data.frame' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (cases, n_permutations, feature_distribution, bin_continuous, 
##     bin_cuts, use_density) 
## {
##     nrows <- nrow(cases) * n_permutations
##     perm <- as.data.frame(lapply(seq_along(cases), function(i) {
##         perms <- if (is.numeric(cases[[i]]) && identical(is.na(feature_distribution[[i]]), 
##             TRUE)) {
##             rep(cases[[i]], each = n_permutations)
##         }
##         else if (is.numeric(cases[[i]]) && bin_continuous) {
##             bin <- sample(seq_along(feature_distribution[[i]]), 
##                 nrows, TRUE, as.numeric(feature_distribution[[i]]))
##             diff(bin_cuts[[i]])[bin] * runif(nrows) + bin_cuts[[i]][bin]
##         }
##         else if (is.numeric(cases[[i]]) && use_density) {
##             width <- diff(feature_distribution[[i]]$x[c(1, 2)])/2
##             sample(feature_distribution[[i]]$x, nrows, TRUE, 
##                 feature_distribution[[i]]$y) + runif(nrows, -width, 
##                 width)
##         }
##         else if (is.numeric(cases[[i]])) {
##             rnorm(nrows) * feature_distribution[[i]]["sd"] + 
##                 feature_distribution[[i]]["mean"]
##         }
##         else if (is.character(cases[[i]])) {
##             sample(names(feature_distribution[[i]]), nrows, TRUE, 
##                 as.numeric(feature_distribution[[i]]))
##         }
##         else if (is.logical(cases[[i]])) {
##             sample(as.logical(names(feature_distribution[[i]])), 
##                 nrows, TRUE, as.numeric(feature_distribution[[i]]))
##         }
##         else if (is.factor(cases[[i]])) {
##             x <- sample(names(feature_distribution[[i]]), nrows, 
##                 TRUE, as.numeric(feature_distribution[[i]]))
##             factor(x, levels = names(feature_distribution[[i]]))
##         }
##         else if (inherits(cases[[i]], "Date") || inherits(cases[[i]], 
##             "POSIXt")) {
##             rep(cases[[i]], each = n_permutations)
##         }
##         if (is.integer(cases[[i]])) {
##             as.integer(round(perms))
##         }
##         else {
##             perms
##         }
##     }), stringsAsFactors = FALSE)
##     names(perm) <- names(cases)
##     perm[seq.int(1, by = n_permutations, length.out = nrow(cases)), 
##         ] <- cases
##     perm
## }
## <bytecode: 0x7f8637725af0>
## <environment: namespace:lime>

Implementation

cases = x
n_permutations = n_permutations
feature_distribution = explainer$feature_distribution
bin_continuous = explainer$bin_continuous
bin_cuts = explainer$bin_cuts
use_density = explainer$use_density

nrows <- nrow(cases) * n_permutations

seq <- seq_along(cases)
seq

## [1] 1 2 3 4 5 6 7 8 9

i <- seq[1]
# Case when is.numeric(cases[[i]]) && bin_continuous
bin <- sample(seq_along(feature_distribution[[i]]), nrows, TRUE,
              as.numeric(feature_distribution[[i]]))
head(bin)

## [1] 3 2 2 3 1 4

perms <- diff(bin_cuts[[i]])[bin] * runif(nrows) + bin_cuts[[i]][bin]
head(perms)

##        75%        50%        50%        75%        25%       100% 
## 0.31843306 0.26095925 0.27016586 0.29337348 0.09870388 0.41799989

if (is.integer(cases[[i]])) {
    perms <- as.integer(round(perms))
  } else {
    perms <- perms
  }
head(perms)

##        75%        50%        50%        75%        25%       100% 
## 0.31843306 0.26095925 0.27016586 0.29337348 0.09870388 0.41799989

perm <- as.data.frame(lapply(seq_along(cases), function(i) {
         perms <- if (is.numeric(cases[[i]]) && bin_continuous) {
             bin <- sample(seq_along(feature_distribution[[i]]), 
                 nrows, TRUE, as.numeric(feature_distribution[[i]]))
             diff(bin_cuts[[i]])[bin] * runif(nrows) + bin_cuts[[i]][bin]
         }
         if (is.integer(cases[[i]])) {
             as.integer(round(perms))
         }
         else {
             perms
         }
}), stringsAsFactors = FALSE)
names(perm) <- names(cases)
head(perm)

##         ccf   rough_cor        D     sd_D    matches mismatches        cms
## 1 0.2705356  0.04721452 2.890612 3.299040 13.3106130   3.293206 9.90214130
## 2 0.3163293 -0.02961963 2.608406 2.001200  1.6624150  10.470000 0.25555357
## 3 0.1959027  0.00355982 3.489860 2.082462  1.3736290  10.455311 0.73976965
## 4 0.5552310  0.01850693 1.877354 3.699455  2.3959182  21.298464 0.09669388
## 5 0.3213730 -0.08444230 3.667153 2.994387  3.2297235   4.710605 0.59609020
## 6 0.3822426 -0.02930569 2.756979 2.625290  0.3588511   1.151511 8.24649325
##     non_cms  sum_peaks
## 1 0.9874824  2.2879764
## 2 6.5116201 25.6827437
## 3 8.9375487  0.8704537
## 4 6.3778386  2.6791522
## 5 6.1553891  3.5069457
## 6 4.3849214  0.2635327

# The first row gets replaced with the case of interest for each set of perturbations
perm[seq.int(1, by = n_permutations, length.out = nrow(cases)), ] <- cases
head(perm)

##         ccf   rough_cor           D        sd_D   matches mismatches
## 1 0.2704221  0.27042215 0.001396367 0.002090586 0.5901742   7.616146
## 2 0.3163293 -0.02961963 2.608405693 2.001199573 1.6624150  10.470000
## 3 0.1959027  0.00355982 3.489860245 2.082461588 1.3736290  10.455311
## 4 0.5552310  0.01850693 1.877353946 3.699454687 2.3959182  21.298464
## 5 0.3213730 -0.08444230 3.667153074 2.994386679 3.2297235   4.710605
## 6 0.3822426 -0.02930569 2.756979371 2.625289815 0.3588511   1.151511
##          cms  non_cms  sum_peaks
## 1 0.59017419 3.808073  1.4632093
## 2 0.25555357 6.511620 25.6827437
## 3 0.73976965 8.937549  0.8704537
## 4 0.09669388 6.377839  2.6791522
## 5 0.59609020 6.155389  3.5069457
## 6 8.24649325 4.384921  0.2635327

`predict_model`

Inputs

x: a model
newdata: the new observations to predict
type: either 'raw' to indicate predicted values, or 'prob' to indicate class probabilities

Procedure

Determine which predict_model function to use based on the model type

 predict_model <- function(x, newdata, type, ...) {
   UseMethod('predict_model')
 }

Make predictions appropriately based on the model type

Default

  predict_model.default <- function(x, newdata, type, ...) {
    p <- predict(x, newdata = newdata, type = type, ...)
    if (type == 'raw') 
      p <- data.frame(Response = p, stringsAsFactors = FALSE)
    as.data.frame(p)
  }

Model Fit (?)

  predict_model.model_fit <- function (x, newdata, type, ...) {
    if (type == "raw") 
      type <- "numeric"
      p <- predict(x, new_data = newdata, type = type, ...)
    if (type == "raw") {
      p <- data.frame(Response = p[[1]], stringsAsFactors = FALSE)
    }
    else if (type == "prob") {
      names(p) <- sub(".pred_", "", names(p))
    }
    p
  }

Wrapped Model

  predict_model.WrappedModel <- function (x, newdata, type, ...) {
    if (!requireNamespace("mlr", quietly = TRUE)) {
      stop("mlr must be available when working with WrappedModel models")
    }
    p <- predict(x, newdata = newdata, ...)
    switch(type, raw = data.frame(Response = mlr::getPredictionResponse(p), 
      stringsAsFactors = FALSE), prob = mlr::getPredictionProbabilities(p, 
      p$task.desc$class.levels), stop("Type must be either \"raw\" or \"prob\"", 
    call. = FALSE))
  }

xbg Booster

  predict_model.xgb.Booster <- function (x, newdata, type, ...) {
    if (!requireNamespace("xgboost", quietly = TRUE)) {
      stop("The xgboost package is required for predicting xgboost models")
    }
    if (is.data.frame(newdata)) {
      newdata <- xgboost::xgb.DMatrix(as.matrix(newdata))
    }
    p <- data.frame(predict(x, newdata = newdata, reshape = TRUE, ...), 
      stringsAsFactors = FALSE)
    if (type == "raw") {
      names(p) <- "Response"
    }
    else if (type == "prob") {
      if (ncol(p) == 1) {
        names(p) = "1"
        p[["0"]] <- 1 - p[["1"]]
      }
      else {
        names(p) <- as.character(seq_along(p))
      }
    }
    p
  }

lda

  predict_model.lda <- function(x, newdata, type, ...) {
    res <- predict(x, newdata = newdata, ...)
    switch(
    type,
    raw = data.frame(Response = res$class, stringsAsFactors = FALSE),
    prob = as.data.frame(res$posterior, check.names = FALSE)
    )
  }

keras

  predict_model.keras.engine.training.Model <- function (x, newdata, type, ...) {
    if (!requireNamespace("keras", quietly = TRUE)) {
      stop("The keras package is required for predicting keras models")
    }
    res <- predict(x, as.array(newdata))
    if (type == "raw") {
      data.frame(Response = res[, 1])
    }
    else {
      if (ncol(res) == 1) {
      res <- cbind(1 - res, res)
    }
    colnames(res) <- as.character(seq_len(ncol(res)))
    as.data.frame(res, check.names = FALSE)
    }
  }

H2O Model

  predict_model.H2OModel <- ## function (x, newdata, type, ...) {
     if (!requireNamespace("h2o", quietly = TRUE)) {
         stop("The h2o package is required for predicting h2o models")
     }
     pred <- h2o::h2o.predict(x, h2o::as.h2o(newdata))
     h2o_model_class <- class(x)[[1]]
     if (h2o_model_class %in% c("H2OBinomialModel", "H2OMultinomialModel")) {
         return(as.data.frame(pred[, -1]))
     }
     else if (h2o_model_class == "H2ORegressionModel") {
         ret <- as.data.frame(pred[, 1])
         names(ret) <- "Response"
         return(ret)
     }
     else {
         stop("This h2o model is not currently supported.")
     }
 }

Function

# Print the function (and check to see if there have been any changes)
func_check("predict_model", "0.5.0")

## A single object matching 'predict_model' was found
## It was found in the following places
##   package:lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     UseMethod("predict_model")
## }
## <bytecode: 0x7f863b9dff98>
## <environment: namespace:lime>

func_check("predict_model.default", "0.5.0")

## A single object matching 'predict_model.default' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     p <- predict(x, newdata = newdata, type = type, ...)
##     if (type == "raw") 
##         p <- data.frame(Response = p, stringsAsFactors = FALSE)
##     as.data.frame(p)
## }
## <bytecode: 0x7f8620bcbf40>
## <environment: namespace:lime>

func_check("predict_model.model_fit", "0.5.0")

## A single object matching 'predict_model.model_fit' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (type == "raw") 
##         type <- "numeric"
##     p <- predict(x, new_data = newdata, type = type, ...)
##     if (type == "raw") {
##         p <- data.frame(Response = p[[1]], stringsAsFactors = FALSE)
##     }
##     else if (type == "prob") {
##         names(p) <- sub(".pred_", "", names(p))
##     }
##     p
## }
## <bytecode: 0x7f863c4efba0>
## <environment: namespace:lime>

func_check("predict_model.WrappedModel", "0.5.0")

## A single object matching 'predict_model.WrappedModel' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (!requireNamespace("mlr", quietly = TRUE)) {
##         stop("mlr must be available when working with WrappedModel models")
##     }
##     p <- predict(x, newdata = newdata, ...)
##     switch(type, raw = data.frame(Response = mlr::getPredictionResponse(p), 
##         stringsAsFactors = FALSE), prob = mlr::getPredictionProbabilities(p, 
##         p$task.desc$class.levels), stop("Type must be either \"raw\" or \"prob\"", 
##         call. = FALSE))
## }
## <bytecode: 0x7f863ce37268>
## <environment: namespace:lime>

func_check("predict_model.xgb.Booster", "0.5.0")

## A single object matching 'predict_model.xgb.Booster' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (!requireNamespace("xgboost", quietly = TRUE)) {
##         stop("The xgboost package is required for predicting xgboost models")
##     }
##     if (is.data.frame(newdata)) {
##         newdata <- xgboost::xgb.DMatrix(as.matrix(newdata))
##     }
##     p <- data.frame(predict(x, newdata = newdata, reshape = TRUE, 
##         ...), stringsAsFactors = FALSE)
##     if (type == "raw") {
##         names(p) <- "Response"
##     }
##     else if (type == "prob") {
##         if (ncol(p) == 1) {
##             names(p) = "1"
##             p[["0"]] <- 1 - p[["1"]]
##         }
##         else {
##             names(p) <- as.character(seq_along(p))
##         }
##     }
##     p
## }
## <bytecode: 0x7f861f775cb0>
## <environment: namespace:lime>

func_check("predict_model.lda", "0.5.0")

## A single object matching 'predict_model.lda' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     res <- predict(x, newdata = newdata, ...)
##     switch(type, raw = data.frame(Response = res$class, stringsAsFactors = FALSE), 
##         prob = as.data.frame(res$posterior, check.names = FALSE))
## }
## <bytecode: 0x7f861f388e60>
## <environment: namespace:lime>

func_check("predict_model.keras.engine.training.Model", "0.5.0")

## A single object matching 'predict_model.keras.engine.training.Model' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (!requireNamespace("keras", quietly = TRUE)) {
##         stop("The keras package is required for predicting keras models")
##     }
##     res <- predict(x, as.array(newdata))
##     if (type == "raw") {
##         data.frame(Response = res[, 1])
##     }
##     else {
##         if (ncol(res) == 1) {
##             res <- cbind(1 - res, res)
##         }
##         colnames(res) <- as.character(seq_len(ncol(res)))
##         as.data.frame(res, check.names = FALSE)
##     }
## }
## <bytecode: 0x7f861f1563d8>
## <environment: namespace:lime>

func_check("predict_model.H2OModel", "0.5.0")

## A single object matching 'predict_model.H2OModel' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (!requireNamespace("h2o", quietly = TRUE)) {
##         stop("The h2o package is required for predicting h2o models")
##     }
##     pred <- h2o::h2o.predict(x, h2o::as.h2o(newdata))
##     h2o_model_class <- class(x)[[1]]
##     if (h2o_model_class %in% c("H2OBinomialModel", "H2OMultinomialModel")) {
##         return(as.data.frame(pred[, -1]))
##     }
##     else if (h2o_model_class == "H2ORegressionModel") {
##         ret <- as.data.frame(pred[, 1])
##         names(ret) <- "Response"
##         return(ret)
##     }
##     else {
##         stop("This h2o model is not currently supported.")
##     }
## }
## <bytecode: 0x7f863aae6580>
## <environment: namespace:lime>

func_check("predict_model.ranger", "0.5.0")

## A single object matching 'predict_model.ranger' was found
## It was found in the following places
##   registered S3 method for predict_model from namespace lime
##   namespace:lime
## with value
## 
## function (x, newdata, type, ...) 
## {
##     if (!requireNamespace("ranger", quietly = TRUE)) {
##         stop("The ranger package is required for predicting ranger models")
##     }
##     if (x$treetype == "Classification") {
##         res_votes <- predict(x, data = newdata, predict.all = TRUE, 
##             ...)$predictions
##         res_votes <- t(table(res_votes, row(res_votes)))
##         classes <- colnames(x$confusion.matrix)
##         res <- matrix(0, nrow = nrow(res_votes), ncol = length(classes), 
##             dimnames = list(NULL, classes))
##         res[, as.integer(colnames(res_votes))] <- res_votes/x$num.trees
##     }
##     else {
##         res <- predict(x, data = newdata, ...)$predictions
##     }
##     switch(type, raw = data.frame(Response = res), prob = as.data.frame(res))
## }
## <bytecode: 0x7f863b354120>
## <environment: namespace:lime>

`set_labels`

Inputs

res = the predictions (case_res)
model = explainer$model

Function

# Print the function (and check to see if there have been any changes)
func_check("set_labels", "0.5.0")

## A single object matching 'set_labels' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (res, model) 
## {
##     labels <- attr(model, "lime_labels")
##     if (model_type(model) == "classification" && !is.null(labels)) {
##         if (length(labels) != ncol(res)) {
##             warning("Ignoring provided class labels as length differs from model output")
##         }
##         else {
##             names(res) <- labels
##         }
##     }
##     res
## }
## <bytecode: 0x7f861f639f90>
## <environment: namespace:lime>

`numerify`

Inputs

x: object case_perm with the perturbed values
type: the explainer$feature_type object
bin_continuous: the value of explainer$bin_continuous
bin_cuts: the value of explainer$bin_cuts

Procedure

Apply the function below to each observation in the perturbed dataset.
```
 setNames(as.data.frame(lapply(seq_along(x), function(i) {
```

If the variable is a character, factor, or logical change it to a numeric variable.

 if (type[i] %in% c("character", "factor", "logical")) {
   as.numeric(x[[i]] == x[[i]][1])
 }

If the variable is some other type and bin_continuous=TRUE, then put the bin cuts into a variable called cuts, set the first and last bin cuts to -inf and inf, respectively, determine which bin each permutation falls into, and determine whether a permution falls into the test data observation bin of interest.
```
 else {
   if (bin_continuous) {
     cuts <- bin_cuts[[i]]
     cuts[1] <- -Inf
     cuts[length(cuts)] <- Inf
     xi <- cut(x[[i]], unique(cuts), include.lowest = T)
     as.numeric(xi == xi[1])
```
If the variable is some other type and bin_continuous=FALSE, then just return the permutations.
```
   } else {
     x[[i]]
   }
 }
 }), stringsAsFactors = FALSE), names(x))
```

Function

# Print the function (and check to see if there have been any changes)
func_check("numerify", "0.5.0")

## A single object matching 'numerify' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, type, bin_continuous, bin_cuts) 
## {
##     setNames(as.data.frame(lapply(seq_along(x), function(i) {
##         if (type[i] %in% c("character", "factor", "logical")) {
##             as.numeric(x[[i]] == x[[i]][1])
##         }
##         else if (type[i] == "date_time" || type[i] == "constant") {
##             rep(0, nrow(x))
##         }
##         else {
##             if (bin_continuous) {
##                 cuts <- bin_cuts[[i]]
##                 cuts[1] <- -Inf
##                 cuts[length(cuts) + 1] <- Inf
##                 xi <- cut(x[[i]], unique(cuts), include.lowest = T)
##                 as.numeric(xi == xi[1])
##             }
##             else {
##                 x[[i]]
##             }
##         }
##     }), stringsAsFactors = FALSE), names(x))
## }
## <bytecode: 0x7f861f889498>
## <environment: namespace:lime>

`feature_scale`

Inputs

x = perms
distribution = explainer$feature_distribution
type = explainer$feature_type
bin_continuous = explainer$bin_continuous

Procedure

Scales the distribuiton of the numeric permutions if bin_continuous was set to FALSE.

Function

# Print the function (and check to see if there have been any changes)
func_check("feature_scale", "0.5.0")

## A single object matching 'feature_scale' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, distribution, type, bin_continuous) 
## {
##     setNames(as.data.frame(lapply(seq_along(x), function(i) {
##         if (type[i] == "numeric" && !bin_continuous) {
##             scale(x[, i], distribution[[i]]["mean"], distribution[[i]]["sd"])
##         }
##         else {
##             x[, i]
##         }
##     }), stringsAsFactors = FALSE), names(x))
## }
## <bytecode: 0x7f863a2338f8>
## <environment: namespace:lime>

`model_permutations`

Inputs

x: the as.matrix(perms) object with the perturbations
y: case_res[i, , drop = FALSE]
weights: the weights sim
labels: the labels object
n_labels: the n_labels object
n_features: the n_features object
feature_method: the feature_select object

Procedure

If all of the weights are 0 (no difference from the case of interest), then return an error.

 if (all(weights[-1] == 0)) {
   stop("All permutations have no similarity to the original observation. 
   Try setting bin_continuous to TRUE and/or increase kernel_size", 
   call. = FALSE)
 }

Create labels if n_labels is not NULL.

 if (!is.null(n_labels)) {
   labels <- names(y)[order(as.data.frame(y)[1, ], decreasing = TRUE)[seq_len(n_labels)]]
 }

Something about missing values (?)

 x <- x[, colSums(is.na(x)) == 0 & apply(x, 2, var) != 0, drop = FALSE]

If the response has the same value for all permutations, then an error is returned.

 res <- lapply(labels, function(label) {
   if (length(unique(y[[label]])) == 1) {
     stop("Response is constant across permutations. Please check your model", 
     call. = FALSE)
 }

Select which variables are important for the model using the select_features function.
```
 features <- select_features(feature_method, x, y[[label]], weights, n_features)
```
glmnet does not allow n_features=1, so in this case the model must be fit separately. This is done by creating a matrix with a column of 1s for the intercept and the selected variables for the other columns, fiting a model with the probability as the response and the selected variables as the predictors, computing the r2 values, and extracting the intercept, non-intercept coefficients, and model prediction for the case of interest.
```
 if (length(features) == 1) {
   x_fit = cbind("(Intercept)" = rep(1, nrow(x)), x[, features, drop = FALSE])
   fit <- glm.fit(x = x_fit, y = y[[label]],  weights = weights, family = gaussian())
   r2 <- 1 - fit$deviance/fit$null.deviance
   coefs <- coef(fit)
   intercept <- coefs[1]
   coefs <- coefs[-1]
   model_pred <- fit$fitted.values[1]
```

When n_features > 1, fit the ridge regression model using glmnet when the order of the observations shuffled, and extract the same items as in Step 6.

 } else {
   shuffle_order <- sample(length(y[[label]])) # glm is sensitive to the order of the examples
   fit <- glmnet(x[shuffle_order, features], y[[label]][shuffle_order], weights = weights[shuffle_order], 
     alpha = 0, lambda = 0.001)
   r2 <- fit$dev.ratio
   coefs <- coef(fit)
   intercept <- coefs[1, 1]
   coefs <- coefs[-1, 1]
   model_pred <- predict(fit, x[1, features, drop = FALSE])[1]
 }

Finally, put the values from above into a data frame.

 data.frame(
   label = label,
   feature = names(coefs),
   feature_weight = unname(coefs),
   model_r2 = r2,
   model_intercept = intercept,
   model_prediction = model_pred,
   stringsAsFactors = FALSE
 )
 })

Bind all of the results associated with different cases.
```
do.call(rbind, res)
```

Function

# Print the function (and check to see if there have been any changes)
func_check("model_permutations", "0.5.0")

## A single object matching 'model_permutations' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, y, weights, labels, n_labels, n_features, feature_method) 
## {
##     if (all(weights[-1] == 0)) {
##         stop("All permutations have no similarity to the original observation. Try setting bin_continuous to TRUE and/or increase kernel_size", 
##             call. = FALSE)
##     }
##     if (!is.null(n_labels)) {
##         labels <- names(y)[order(as.data.frame(y)[1, ], decreasing = TRUE)[seq_len(n_labels)]]
##     }
##     x <- x[, colSums(is.na(x)) == 0 & apply(x, 2, var) != 0, 
##         drop = FALSE]
##     res <- lapply(labels, function(label) {
##         if (length(unique(y[[label]])) == 1) {
##             stop("Response is constant across permutations. Please check your model", 
##                 call. = FALSE)
##         }
##         features <- select_features(feature_method, x, y[[label]], 
##             weights, n_features)
##         if (length(features) == 1) {
##             x_fit = cbind(`(Intercept)` = rep(1, nrow(x)), x[, 
##                 features, drop = FALSE])
##             fit <- glm.fit(x = x_fit, y = y[[label]], weights = weights, 
##                 family = gaussian())
##             r2 <- 1 - fit$deviance/fit$null.deviance
##             coefs <- coef(fit)
##             intercept <- coefs[1]
##             coefs <- coefs[-1]
##             model_pred <- fit$fitted.values[1]
##         }
##         else {
##             shuffle_order <- sample(length(y[[label]]))
##             fit <- glmnet(x[shuffle_order, features], y[[label]][shuffle_order], 
##                 weights = weights[shuffle_order], alpha = 0, 
##                 lambda = 2/length(y[[label]]))
##             r2 <- fit$dev.ratio
##             coefs <- coef(fit)
##             intercept <- coefs[1, 1]
##             coefs <- coefs[-1, 1]
##             model_pred <- predict(fit, x[1, features, drop = FALSE])[1]
##         }
##         data.frame(label = label, feature = names(coefs), feature_weight = unname(coefs), 
##             model_r2 = r2, model_intercept = intercept, model_prediction = model_pred, 
##             stringsAsFactors = FALSE)
##     })
##     do.call(rbind, res)
## }
## <bytecode: 0x7f8639f7cd60>
## <environment: namespace:lime>

`select_features`

Inputs

method: feature_method
x: x
y: y[[label]]
weights: weights
n_features: n_features

Procedure

Performs features selection using the specified method. Note that this is based on a handful of other functions including the feature_selection_method.

    if (n_features >= ncol(x)) {
      return(seq_len(ncol(x)))
    }
    method <- match.arg(method, feature_selection_method())
    switch(
      method,
      auto = if (n_features <= 6) {
        select_features("forward_selection", x, y, weights, n_features)
      } else {
        select_features("highest_weights", x, y, weights, n_features)
      },
      none = seq_len(nrow(x)),
      forward_selection = select_f_fs(x, y, weights, n_features),
      highest_weights = select_f_hw(x, y, weights, n_features),
      lasso_path = select_f_lp(x, y, weights, n_features),
      tree = select_tree(x, y, weights, n_features),
      stop("Method not implemented", call. = FALSE)
    )

Function

# Print the function (and check to see if there have been any changes)
func_check("select_features", "0.5.0")

## A single object matching 'select_features' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (method, x, y, weights, n_features) 
## {
##     if (n_features >= ncol(x)) {
##         return(seq_len(ncol(x)))
##     }
##     method <- match.arg(method, feature_selection_method())
##     switch(method, auto = if (n_features <= 6) {
##         select_features("forward_selection", x, y, weights, n_features)
##     } else {
##         select_features("highest_weights", x, y, weights, n_features)
##     }, none = seq_len(ncol(x)), forward_selection = select_f_fs(x, 
##         y, weights, n_features), highest_weights = select_f_hw(x, 
##         y, weights, n_features), lasso_path = select_f_lp(x, 
##         y, weights, n_features), tree = select_tree(x, y, weights, 
##         n_features), stop("Method not implemented", call. = FALSE))
## }
## <bytecode: 0x7f8639b84120>
## <environment: namespace:lime>

func_check("feature_selection_method", "0.5.0")

## A single object matching 'feature_selection_method' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function () 
## c("auto", "none", "forward_selection", "highest_weights", "lasso_path", 
##     "tree")
## <bytecode: 0x7f8639a5ce40>
## <environment: namespace:lime>

`select_`

Options

Forward selection: select_f_fs
Heighest weights: select_f_hw
Lasso path: select_f_lp
Tree: select_tree

`select_f_fs`

# Print the function (and check to see if there have been any changes)
func_check("select_f_fs", "0.5.0")

## A single object matching 'select_f_fs' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, y, weights, n_features) 
## {
##     features <- c()
##     for (i in seq_len(n_features)) {
##         max <- -1e+05
##         best <- 0
##         for (j in seq_len(ncol(x))) {
##             if (j %in% features) 
##                 next
##             if (length(features) == 0) {
##                 x_fit = cbind(`(Intercept)` = rep(1, nrow(x)), 
##                   x[, j, drop = FALSE])
##                 fit <- glm.fit(x = x_fit, y = y, weights = weights, 
##                   family = gaussian())
##                 r2 <- 1 - fit$deviance/fit$null.deviance
##             }
##             else {
##                 fit <- glmnet(x[, c(features, j), drop = FALSE], 
##                   y, weights = weights, alpha = 0, lambda = 0)
##                 r2 <- fit$dev.ratio
##             }
##             if (is.finite(r2) && r2 > max) {
##                 max <- r2
##                 best <- j
##             }
##         }
##         if (best == 0) {
##             stop("Failed to select features with forward selection. Please choose another feature selector", 
##                 call. = FALSE)
##         }
##         features <- c(features, best)
##     }
##     features
## }
## <bytecode: 0x7f8638731530>
## <environment: namespace:lime>

`select_f_hw`

# Print the function (and check to see if there have been any changes)
func_check("select_f_hw", "0.5.0")

## A single object matching 'select_f_hw' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, y, weights, n_features) 
## {
##     shuffle_order <- sample(length(y))
##     fit_model <- glmnet(x[shuffle_order, ], y[shuffle_order], 
##         weights = weights[shuffle_order], alpha = 0, lambda = 0)
##     features <- coef(fit_model)[-1, 1]
##     features_order <- order(abs(features), decreasing = TRUE)
##     head(features_order, n_features)
## }
## <bytecode: 0x7f8639068d28>
## <environment: namespace:lime>

`select_f_lp`

# Print the function (and check to see if there have been any changes)
func_check("select_f_lp", "0.5.0")

## A single object matching 'select_f_lp' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, y, weights, n_features) 
## {
##     shuffle_order <- sample(length(y))
##     fit <- glmnet(x[shuffle_order, ], y[shuffle_order], weights = weights[shuffle_order], 
##         alpha = 1, nlambda = 300)
##     has_value <- apply(coef(fit)[-1, ], 2, function(x) x != 0)
##     f_count <- apply(has_value, 2, sum)
##     row <- rev(which(f_count <= n_features))[1]
##     features <- which(has_value[, row])
##     features
## }
## <bytecode: 0x7f86381c98b8>
## <environment: namespace:lime>

`select_tree`

# Print the function (and check to see if there have been any changes)
func_check("select_tree", "0.5.0")

## A single object matching 'select_tree' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (x, y, weights, n_features) 
## {
##     xgb_version <- packageVersion("xgboost")
##     if (xgb_version < "0.6.4.6") 
##         stop("You need to install latest xgboost (version >= \"0.6.4.6\") from Xgboost Drat repository to use tree mode for feature selection.\nMore info on http://xgboost.readthedocs.io/en/latest/R-package/xgboostPresentation.html")
##     number_trees <- max(trunc(log2(n_features)), 2)
##     if (log2(n_features) != number_trees) 
##         message("In \"tree\" mode, number of features should be a power of 2 and at least of 4 (= deepness of the binary tree of 2), setting was set to [", 
##             n_features, "], it has been replaced by [", 2^number_trees, 
##             "].")
##     mat <- xgboost::xgb.DMatrix(x, label = y, weight = weights)
##     bst.bow <- xgboost::xgb.train(params = list(max_depth = number_trees, 
##         eta = 1, silent = 1, objective = "binary:logistic"), 
##         data = mat, nrounds = 1, lambda = 0)
##     dt <- xgboost::xgb.model.dt.tree(model = bst.bow)
##     selected_words <- head(dt[["Feature"]], n_features)
##     which(colnames(mat) %in% selected_words)
## }
## <bytecode: 0x7f8637ae4110>
## <environment: namespace:lime>

Implementation

# Set up and implementation
x <- hamby224_test %>% 
  arrange(case) %>%
  select(rf_features) %>% 
  na.omit()
explainer <- lime:::lime.data.frame(x = hamby173and252_train %>%
                                      select(rf_features), 
                                    y = hamby173and252_train %>%
                                      select(samesource), 
                                    model = as_classifier(bulletr::rtrees))
label = TRUE
n_labels = NULL
n_features = 3
n_permutations = 5000
feature_select = 'auto'
dist_fun = "gower"
kernel_width = NULL
gower_pow = 1
o_type <- lime:::output_type(explainer)
if (is.null(kernel_width)) {
   kernel_width <- sqrt(ncol(x)) * 0.75
}
kernel <- lime:::exp_kernel(kernel_width)
case_perm <- lime:::permute_cases.data.frame(x,
                                             n_permutations,
                                             explainer$feature_distribution,
                                             explainer$bin_continuous,
                                             explainer$bin_cuts,
                                             explainer$use_density)
case_res <- lime:::predict_model(explainer$model, 
                                 explainer$preprocess(case_perm),
                                 type = o_type)
case_res <- lime:::set_labels(case_res, explainer$model)
case_ind <- split(seq_len(nrow(case_perm)), rep(seq_len(nrow(x)), each = n_permutations))
i <- case_ind[[1]]
if (dist_fun == "gower") {
  sim <- 1 - (gower_dist(case_perm[i[1], , drop = FALSE], 
    case_perm[i, , drop = FALSE]))^gower_pow
}
perms <- lime:::numerify(case_perm[i, ], 
                         explainer$feature_type, 
                         explainer$bin_continuous,
                         explainer$bin_cuts)

# Running the function
lime:::select_f_fs(x = as.matrix(perms), 
                   y = case_res[i, , drop = FALSE][[label]], 
                   weights = sim, 
                   n_features = n_features)

## [1] 2 9 1

# One implementation of the inside of select_f_fs

# Input values
x = as.matrix(perms)
y = case_res[i, , drop = FALSE][[label]]
weights = sim
n_features = n_features

# Creates a vector where to store the selected features
features <- c()

#for (i in seq_len(n_features)) {

# My code to run this loop
seq_len(n_features)

## [1] 1 2 3

loopi <- 1

# Settings
max <- -1e+05
best <- 0

seq_len(ncol(x))

## [1] 1 2 3 4 5 6 7 8 9

for (loopj in seq_len(ncol(x))) {

#if (loopj %in% features)
#  next

if (length(features) == 0) {
  
  x_fit = cbind(`(Intercept)` = rep(1, nrow(x)), x[, loopj, drop = FALSE])
  print(head(x_fit))
  
  fit <- glm.fit(x = x_fit, y = y, weights = weights, family = gaussian())
  
  r2 <- 1 - fit$deviance/fit$null.deviance
  print(r2)

  } else {
    
    x_fit = x[, c(features, loopj), drop = FALSE]
    print(head(x_fit))
    fit <- glmnet(x_fit,
                  y, weights = weights, alpha = 0, lambda = 0)
    
    r2 <- fit$dev.ratio
    print(r2)
  }

if (is.finite(r2) && r2 > max) {
  max <- r2
  best <- loopj
  print(max)
}

}

##      (Intercept) ccf
## [1,]           1   1
## [2,]           1   0
## [3,]           1   0
## [4,]           1   0
## [5,]           1   0
## [6,]           1   0
## [1] 0.01478569
## [1] 0.01478569
##      (Intercept) rough_cor
## [1,]           1         1
## [2,]           1         0
## [3,]           1         1
## [4,]           1         0
## [5,]           1         0
## [6,]           1         1
## [1] 0.01935421
## [1] 0.01935421
##      (Intercept) D
## [1,]           1 1
## [2,]           1 0
## [3,]           1 1
## [4,]           1 0
## [5,]           1 1
## [6,]           1 0
## [1] 3.259662e-05
##      (Intercept) sd_D
## [1,]           1    1
## [2,]           1    0
## [3,]           1    1
## [4,]           1    0
## [5,]           1    0
## [6,]           1    0
## [1] 0.0002569915
##      (Intercept) matches
## [1,]           1       1
## [2,]           1       0
## [3,]           1       0
## [4,]           1       1
## [5,]           1       1
## [6,]           1       0
## [1] 0.007883778
##      (Intercept) mismatches
## [1,]           1          1
## [2,]           1          0
## [3,]           1          1
## [4,]           1          0
## [5,]           1          0
## [6,]           1          1
## [1] 0.001471336
##      (Intercept) cms
## [1,]           1   1
## [2,]           1   0
## [3,]           1   0
## [4,]           1   0
## [5,]           1   1
## [6,]           1   0
## [1] 0.001997553
##      (Intercept) non_cms
## [1,]           1       1
## [2,]           1       1
## [3,]           1       0
## [4,]           1       1
## [5,]           1       0
## [6,]           1       0
## [1] 0.01032787
##      (Intercept) sum_peaks
## [1,]           1         1
## [2,]           1         0
## [3,]           1         0
## [4,]           1         0
## [5,]           1         0
## [6,]           1         0
## [1] 0.01585508

if (best == 0) {
             stop("Failed to select features with forward selection. Please choose another feature selector",
                 call. = FALSE)
}

features <- c(features, best)

#}

features

## [1] 2

`describe_feature`

Inputs

feature = res$feature
case = case_perm[i[1], ]
type = explainer$feature_type
bin_continuous = explainer$bin_continuous
bin_cuts = explainer$bin_cuts

Function

# Print the function (and check to see if there have been any changes)
func_check("describe_feature", "0.5.0")

## A single object matching 'describe_feature' was found
## It was found in the following places
##   namespace:lime
## with value
## 
## function (feature, case, type, bin_continuous, bin_cuts) 
## {
##     sapply(feature, function(f) {
##         if (type[[f]] == "logical") {
##             paste0(f, " is ", tolower(as.character(case[[f]])))
##         }
##         else if (type[[f]] %in% c("character", "factor")) {
##             paste0(f, " = ", as.character(case[[f]]))
##         }
##         else if (bin_continuous) {
##             cuts <- bin_cuts[[f]]
##             cuts[1] <- -Inf
##             cuts[length(cuts)] <- Inf
##             bin <- cut(case[[f]], unique(cuts), labels = FALSE, 
##                 include.lowest = TRUE)
##             cuts <- trimws(format(cuts, digits = 3))
##             if (bin == 1) {
##                 paste0(f, " <= ", cuts[bin + 1])
##             }
##             else if (bin == length(cuts) - 1) {
##                 paste0(cuts[bin], " < ", f)
##             }
##             else {
##                 paste0(cuts[bin], " < ", f, " <= ", cuts[bin + 
##                   1])
##             }
##         }
##         else {
##             f
##         }
##     })
## }
## <bytecode: 0x7f86384d6ea0>
## <environment: namespace:lime>

Python Package

I have not spent a lot of time going through the Python package code, but I wanted to check to see if it uses a ridge regression for the feature selection and the explainer model. This article describes that ridge regression can be fit in Python using the sklearn “package”. These images below of the code from the Python lime GitHub repository seem to imply that it is using ridge regression for both of these steps.

Concept Visualizations

The images in this section were created using simulated data that I knew had a local trend around the point of interest. I consider both a linear regression and a ridge regression as the explainer model.

Data and Model

This code generates the data and predictions from a hypothetical complex model.

# Generate features and predictions from a hypothetical complex model
set.seed(20190624)
lime_data <- data.frame(feature1 = sort(runif(250, 0, 1)),
                        feature2 = sample(x = 1:250, 
                                          size = 250, 
                                          replace = FALSE)) %>%
  mutate(prediction = if_else(feature1 >= 0 & feature1 < 0.1, 
                              (0.3 * feature1) + rnorm(n(), 0, 0.01), 
                      if_else(feature1 >= 0.1 & feature1 < 0.3, 
                              rbeta(n(), 1, 0.5), 
                      if_else(feature1 >= 0.3 & feature1 < 0.5, 
                              sin(pi* feature1) + rnorm(n(), 0, 0.5),
                      if_else(feature1 >= 0.5 & feature1 < 0.8, 
                              -(sin(pi* feature1) + rnorm(n(), 0, 0.1)) + 1,
                      if_else(feature1 >= 0.8 & feature1 < 0.9, 
                              0.5 + runif(n(), -0.5, 0.5), 
                              0.5 + rnorm(n(), 0, 0.3)))))))

# Specify a prediction of interest
prediction_of_interest <- data.frame(feature1 = 0.07, 
                                     feature2 = 200,
                                     prediction = 0.05, 
                                     color = factor("Prediction of Interest"))

# Compute the distance between the prediction of interest and 
# all other observations
lime_data$distance <- (1 - gower_dist(x = prediction_of_interest, 
                                      y = lime_data))^20

Linear Regression Explainer

These visualizations are based on a weighted linear regression model used as the explainer.

# Fit the interpretable explainer model (weighted linear regression)
linear_explainer <- lm(formula = prediction ~ feature1 + feature2, 
                       data = lime_data, 
                       weights = distance)

# Extract the coefficients from the explainer
linear_b0 <- coef(linear_explainer)[[1]]
linear_b1 <- coef(linear_explainer)[[2]]
linear_b2 <- coef(linear_explainer)[[3]]

# Predictions versus feature 1
linear_plot_feature1 <- ggplot(data = lime_data,
                        mapping = aes(x = feature1, 
                                      y = prediction, 
                                      size = distance)) + 
  geom_point(color = "grey30",
             alpha = 0.75) + 
  geom_abline(intercept = linear_b0 +
                linear_b2*prediction_of_interest$feature2, 
              slope = linear_b1, 
              size = 1) +
  geom_point(data = prediction_of_interest,
             mapping = aes(x = feature1, y = prediction),
             fill = "#EFB084",
             color = "black",
             size = 5, 
             shape = 23) +
  theme_linedraw(base_family = "Times") +
  labs(x = "Feature 1", 
       y = "Black-Box Prediction") + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = "none")

# Predictions versus feature 1
linear_plot_feature2 <- ggplot(data = lime_data,
                        mapping = aes(x = feature2, 
                                      y = prediction, 
                                      size = distance)) + 
  geom_point(color = "grey30",
             alpha = 0.75) + 
  geom_abline(intercept = linear_b0 + 
                linear_b1*prediction_of_interest$feature1,
              slope = linear_b2,
              size = 1) +
  geom_point(data = prediction_of_interest,
             mapping = aes(x = feature2, y = prediction),
             fill = "#EFB084",
             color = "black",
             size = 5, 
             shape = 23) +
  theme_linedraw(base_family = "Times") +
  labs(x = "Feature 2", 
       y = "Black-Box Prediction") + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = "none")

# Plot that creates a size legend
size_legend_plot <- lime_data %>%
  gather(key = feature, value = value, 1:2) %>%
  ggplot(mapping = aes(x = value, 
                       y = prediction,
                       size = distance)) + 
  geom_point(color = "grey30",
             alpha = 0.75) +
  labs(size = "Weight") +
  theme_linedraw(base_family = "Times") +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

# Extract the legend
size_legend <- get_legend(size_legend_plot)

# Join the two feature plots
lime_linear_plots <- plot_grid(linear_plot_feature1, 
                               linear_plot_feature2, 
                               size_legend,
                               ncol = 3, 
                               rel_widths = c(0.4, 0.4, 0.1))

# Create a plot to extract a legend for the prediction of interest
lime_legend_plot <- ggplot(data = prediction_of_interest, 
                           mapping = aes(x = feature1, 
                                         y = prediction, 
                                         fill = color)) +
  geom_point(size = 5, shape = 23) +
  scale_fill_manual(values = c("#EFB084")) +
  theme_classic(base_family = "Times") +
  labs(fill = "") +
  theme(legend.position = "bottom")

# Extract the legend
lime_legend <- get_legend(lime_legend_plot)

# Join the title, plots, legend, and caption into one figure
lime_linear_concept <- plot_grid(lime_linear_plots, lime_legend,
                         ncol = 1,
                         rel_heights = c(0.9, 0.1))

This example shows that the explainer model is doing a good job of capturing the local trend.

# View the plot
lime_linear_concept

Ridge Regression Explainer

# Fits the explainer model (weighted ridge regression)
ridge_explainer <- glmnet(x = lime_data %>% 
                            select(feature1, feature2) %>%
                            as.matrix(), 
                          y = lime_data %>% pull(prediction),
                          weights = lime_data$distance,
                          family = "gaussian", 
                          alpha = 0, 
                          lambda = 1, 
                          standardize = FALSE)

# Extract the coefficients from the explainer
ridge_b0 <- ridge_explainer$a0[[1]]
ridge_b1 <- ridge_explainer$beta[1,1]
ridge_b2 <- ridge_explainer$beta[2,1]

# Predictions versus feature 1
ridge_plot_feature1 <- ggplot(data = lime_data,
                        mapping = aes(x = feature1, 
                                      y = prediction, 
                                      size = distance)) + 
  geom_point(color = "grey30",
             alpha = 0.75) + 
  geom_abline(intercept = ridge_b0 +
                ridge_b2*prediction_of_interest$feature2, 
              slope = ridge_b1, 
              size = 1) +
  geom_point(data = prediction_of_interest,
             mapping = aes(x = feature1, y = prediction),
             fill = "#EFB084",
             color = "black",
             size = 5, 
             shape = 23) +
  theme_linedraw(base_family = "Times") +
  labs(x = "Feature 1", 
       y = "Black-Box Prediction") + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = "none")

# Predictions versus feature 2
ridge_plot_feature2 <- ggplot(data = lime_data,
                        mapping = aes(x = feature2, 
                                      y = prediction, 
                                      size = distance)) + 
  geom_point(color = "grey30",
             alpha = 0.75) + 
  geom_abline(intercept = ridge_b0 + 
                ridge_b1*prediction_of_interest$feature1,
              slope = ridge_b2,
              size = 1) +
  geom_point(data = prediction_of_interest,
             mapping = aes(x = feature2, y = prediction),
             fill = "#EFB084",
             color = "black",
             size = 5, 
             shape = 23) +
  theme_linedraw(base_family = "Times") +
  labs(x = "Feature 2", 
       y = "Black-Box Prediction") + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = "none")

# Join the two feature plots
lime_ridge_plots <- plot_grid(ridge_plot_feature1, 
                              ridge_plot_feature2, 
                              size_legend,
                              ncol = 3,
                              rel_widths = c(0.4, 0.4, 0.1))

# Join the title, plots, legend, and caption into one figure
lime_ridge_concept <- plot_grid(lime_ridge_plots, 
                                lime_legend,
                                ncol = 1,
                                rel_heights = c(0.9, 0.1))

This example shows that the explainer model is not doing a good job of capturing the local trend.

# Preview the plot
lime_ridge_concept

Proposed Bin Creation Methods

The Data

The cleaning training and testing datasets are loaded in below.

# Load in the training data (Hamby Data 173 and 252)
hamby173and252_train <- read.csv("../../../data/hamby173and252_train.csv")

Subsampled Bins

One idea that I had to try to improve the lime explanations was to subsample from the training dataset before computing the quantiles. The proportion of observations with samesource = FALSE is very large in the training dataset, so that overwhelms the observations with samesource = TRUE. By subsampling, it might allow for better locations to bin the data. I wrote the code below to attempt to apply lime to a subsampled version of the training dataset.

# Code saved for future use for the sampling data

# Create the data (with a set seed)
set.seed(20181128)
hamby173and252_train_sub <- hamby173and252_train %>%
  filter(samesource == FALSE) %>%
  slice(sample(1:length(hamby173and252_train$barrel1), 4500, replace = FALSE)) %>%
  bind_rows(hamby173and252_train %>% filter(samesource == TRUE))

# Save the subsampled data
write.csv(hamby173and252_train_sub, "../../data/hamby173and252_train_sub.csv", row.names = FALSE)

# Apply lime to the subsampled training data with the specified input options
hamby224_lime_explain_sub <- future_pmap(.l = as.list(hamby224_lime_inputs %>%
                                                        select(-case)),
                                         .f = run_lime, # run_lime is one of my helper functions
                                         train = hamby173and252_train_sub %>% select(rf_features),
                                         test = hamby224_test %>% arrange(case) %>% select(rf_features) %>% na.omit(),
                                         rfmodel = as_classifier(rtrees),
                                         label = "TRUE",
                                         nfeatures = 3,
                                         seed = TRUE) %>%
  mutate(training_data = factor("Subsampled"))

# Separate the lime and explain function results from the subsampled data
hamby224_lime_sub <- map(hamby224_lime_explain_sub, function(list) list$lime)
hamby224_explain_sub <- map_df(hamby224_lime_explain_sub, function(list) list$explain)

# Join the lime results from the full and subsampled training data
hamby224_lime <- c(hamby224_lime_full, hamby224_lime_sub)
hamby224_explain <- bind_rows(hamby224_explain_full, hamby224_explain_sub)

Tree Based Bins

Heike suggested the idea of using classification trees to choose the cutoffs for the lime bins. That is, the divisions in the tree could be used as the bin cuts. This would allow us to automate the process and to obtain a penalty parameter since the trees give us nesting. (MSE + lambda * p where p is the from the number of trees multiplied by the number of …)

Heike wrote the function treebink to take x and y variables for fitting a tree and returning the cuts for $k$ bins based on the splits from the tree. The function is stored in the file helper_functions.R. Here is an example using the function below with the predictor variable of ccf.

# Example using treebink
treebink(y = hamby173and252_train$samesource, x = hamby173and252_train$ccf, k = 5)

## [1] 0.493562 0.582999 0.688211 0.778397

However, there seems to be a problem with treebink. For certain variables, when a high number of bins is requested, the function returns more than the desired number of bins. Heike is going to look into this.

She recently updated the function treebink to run using the tree package. Here are some comments from her about the code:

“I’ve moved it from rpart to tree. I am not quite sure that the two packages are doing the exact same thing. They claim they do but they might be implemented a bit differently. What we are doing now is if there are too many splits, we look at the order in which the splits are made and roll back some of them. I haven’t found a case that doesn’t work, but obviously that doesn’t show that there isn’t any :)”

I wrote the function mylime, which adds an option in the lime function from the LIME package to use Heike’s function treebink to obtain tree based bins. This function is also included in the file helper_functions.R. The output from mylime can then be input into the explain function to obtain explanations. I was able to rerun all of the code for creating the hamby224_test_explain dataset with the explanations from the tree based bins included for 2 to 6 bins.

Session Info

sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tree_1.0-40         forcats_0.4.0       stringr_1.4.0      
##  [4] dplyr_0.8.3         purrr_0.3.3         readr_1.3.1        
##  [7] tidyr_1.0.0         tibble_2.1.3        tidyverse_1.2.1    
## [10] randomForest_4.6-14 lime_0.5.1          gower_0.2.1        
## [13] glmnet_3.0-1        Matrix_1.2-17       furrr_0.1.0        
## [16] future_1.15.0       cowplot_0.9.4       ggplot2_3.2.1      
## [19] assertthat_0.2.1   
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-140            xts_0.11-2             
##  [3] lubridate_1.7.4         webshot_0.5.1          
##  [5] httr_1.4.1              tools_3.6.1            
##  [7] backports_1.1.5         R6_2.4.1               
##  [9] lazyeval_0.2.2          colorspace_1.4-1       
## [11] manipulateWidget_0.10.0 withr_2.1.2            
## [13] tidyselect_0.2.5        smoother_1.1           
## [15] curl_4.0                compiler_3.6.1         
## [17] cli_1.1.0               rvest_0.3.4            
## [19] xml2_1.2.2              plotly_4.9.1           
## [21] labeling_0.3            scales_1.0.0           
## [23] DEoptimR_1.0-8          robustbase_0.93-5      
## [25] digest_0.6.22           rmarkdown_1.16         
## [27] pkgconfig_2.0.3         htmltools_0.4.0        
## [29] fastmap_1.0.1           htmlwidgets_1.5.1      
## [31] rlang_0.4.1             readxl_1.3.1           
## [33] TTR_0.23-4              rstudioapi_0.10        
## [35] shiny_1.4.0             shape_1.4.4            
## [37] generics_0.0.2          zoo_1.8-6              
## [39] jsonlite_1.6            crosstalk_1.0.0        
## [41] magrittr_1.5            Rcpp_1.0.3             
## [43] munsell_0.5.0           lifecycle_0.1.0        
## [45] stringi_1.4.3           yaml_2.2.0             
## [47] plyr_1.8.4              grid_3.6.1             
## [49] parallel_3.6.1          listenv_0.7.0          
## [51] promises_1.1.0          crayon_1.3.4           
## [53] miniUI_0.1.1.1          lattice_0.20-38        
## [55] haven_2.1.0             hms_0.5.0              
## [57] zeallot_0.1.0           knitr_1.25             
## [59] pillar_1.4.2            reshape2_1.4.3         
## [61] codetools_0.2-16        glue_1.3.1             
## [63] evaluate_0.14           data.table_1.12.2      
## [65] modelr_0.1.4            vctrs_0.2.0            
## [67] httpuv_1.5.2            foreach_1.4.7          
## [69] cellranger_1.1.0        gtable_0.3.0           
## [71] bulletr_0.1.0.9003      xfun_0.10              
## [73] mime_0.7                xtable_1.8-4           
## [75] broom_0.5.2             x3ptools_0.0.2         
## [77] later_1.0.0             viridisLite_0.3.0      
## [79] shinythemes_1.1.2       iterators_1.0.12       
## [81] rgl_0.100.30            globals_0.12.4         
## [83] ellipsis_0.3.0

Notation	Original Paper Definition	My Interpretation (in terms of a data table)
\(x\in\mathbb{R}^d\)	Original representation of an instance being explained	Vector of observed features for one observation e.g. \((x_1, x_2,...,x_d)\) where \(\textbf{X}\) is an \(n\) by \(d\) matrix
\(f:\mathbb{R}^d→\mathbb{R}\)	The model being explained…In classification, \(f(x)\) is the probability (or a binary indicator) that \(x\) belongs to a certain class.	This is a random forest in our case.
\(x'\in\{0,1\}^{d'}\)	A binary vector for interpretable representation (where interpretable explanations are defined as “use a representation that is understandable by humans”)	Varies based on the simulation method used. It is unclear to us how the size of \(d\) compares to \(d'\). We think they left it vague, so that it could fit any situation (\(d<d'\), \(d=d'\), or \(d>d'\)) (One example of converting to an interpretable representation would be a conversion to a vector of indicator variables that indicate whether the observed value falls in the same bin as the instance being explained e.g. \(x'_i = I[x_i\in \mbox{same variable } i \mbox{ bin as the instance being explained}]\))
\(G\)	A class of potentially interpretable models, such as linear models, decision trees, or falling rule lists	I’m not sure what is considered \(G\) in the R package - the fitting of the final ridge regression model or the ridge regression model used for feature selection?
\(g\)	An explanation as a model…can be readily presented to the user with visual or textual artifacts…The domain of \(g\) is \(\{0,1\}^{d′}\), i.e. \(g\) acts over absence/presence of the interpretable components	This could be a ridge regression model with the form \(f(x) \sim x'_1 + x'_2 + \cdots + x'_{d'}\)
\(\Omega(g)\)	A measure of complexity of the explanation \(g\in G\). For example, for decision trees \(\Omega(g)\) may be the depth of the tree, while for linear models, \(\Omega(g)\) may be the number of non-zero weights.	Seemingly, this is based on how the weights are assigned to the data points.
\(\pi_x(z)\)	A proximity measure between an instance \(z\) to \(x\), so as to define locality around \(x\).	This could be the Gower distance.
\(\mathcal{L}(f,g,\pi_x)\)	A measure of how unfaithful \(g\) is in approximating \(f\) in the locality defined by \(\pi_x\).	Does the LIME R package ever consider this?
\(z'\in\{0,1\}^{d′}\)	A perturbed sample…(which contains a fraction of the nonzero elements of \(x′\))	It appears that they are creating the perturbations in a different way than the R package.
\(z\in\mathbb{R}^d\)	The sample in the original representation	Again, this seems to be different than the R package.
\(f(z)\)	Used as a label for the explanation model	This is the application of the complex model to the perturbed sample.
\(\mathcal{Z}\)	Data set…of perturbed samples with the associated labels	This is the full simulated dataset of perturbations and complex model predictions.
\(\xi(x)\)	The explanation produced by LIME…\(\underset{g∈G}{\arg\min} \ \mathcal{L}(f,g,\pi_x) + \Omega(g)\)	The output from the explanation model which is found through an optimization process.

Step Number	Paper Description	My Interpretation
1	In order to learn the local behavior of \(f\) as the interpretable inputs vary, we approximate \(\mathcal{L}(f,g,\pi_x)\) by drawing samples, weighted by \(\pi_x\).	I don’t understand how drawing these samples approximates \(\mathcal{L}(f,g,\pi_x)\)
2	We sample instances around \(x'\) by drawing nonzero elements of \(x'\) uniformly at random (where the number of such draws is also uniformly sampled).	What is meant by drawing elements of \(x'\)? Is this different than what the R package is doing? I’m wondering if the idea is that \(x'=(x'_1, x'_2,...,x'_{d'})=(1, 1, ...., 1)\), and then a sample is taken such as \(z'=(0, x'_2, 0, 0, x'_5,...,x'_{d'}) =(0, 1, 0, 0, 1, ..., 1)\). However, this doesn’t really make sense in terms of why \(d'\) is different from \(d\)
3	Given a perturbed sample \(z'\in\{0,1\}^{d′}\) (which contains a fraction of the nonzero elements of \(x'\)), we recover the sample in the original representation \(z\in\mathbb{R}^d\) and obtain \(f(z)\), which is used as a label for the explanation model.	How is the original sample recovered? Maybe based on my example in the last step \(z=(0, x_2, 0, 0, x_5, ..., x_d)\)
4	Given this data set \(\mathcal{Z}\) of perturbed samples with the associated labels, we optimize \(\mathcal{L}(f,g,\pi_x) + \Omega(g)\) to get an explanation \(\xi(x)\).	Where is this done in the R package?

Understanding LIME

Katherine Goode

November 18, 2019

Original Paper

Notation

Goal of LIME

Procedure

Comment on Faithfulness

R Package

Main Functions

lime.data.frame

Inputs

Procedure

Function

explain.data.frame

Inputs

Procedure

Function

Implementation

Helper Functions

model_type

Inputs

Procedure

Function

output_type

Inputs

Procedure

Function

exp_kernel

Inputs

Procedure

Function

permute_cases

Inputs

Procedure

Function

Implementation

predict_model

Inputs

Procedure

Function

set_labels

Inputs

Function

numerify

Inputs

Procedure

Function

feature_scale

Inputs

Procedure

Function

model_permutations

Inputs

Procedure

Function

select_features

Inputs

Procedure

Function

select_

Options

select_f_fs

select_f_hw

select_f_lp

select_tree

Implementation

describe_feature

Inputs

Function

Python Package

Concept Visualizations

Data and Model

Linear Regression Explainer

Ridge Regression Explainer

Proposed Bin Creation Methods

The Data

Subsampled Bins

Tree Based Bins

Session Info

`lime.data.frame`

`explain.data.frame`

`model_type`

`output_type`

`exp_kernel`

`permute_cases`

`predict_model`

`set_labels`

`numerify`

`feature_scale`

`model_permutations`

`select_features`

`select_`

`select_f_fs`

`select_f_hw`

`select_f_lp`

`select_tree`

`describe_feature`