11/09/2017
Examples: random forests, neural networks, etc.
Benefit: offer better predictive ability than more interpretable models such as linear regression models, regression and classification trees, etc.
Disadvantages:
LIME (Local Interpretable Model-agnostic Explanations)
lime
to implement the method in Rlime
R package supports:
caret
and mlr
packages# Iris dataset iris[1:3, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa
# Split up the data set into training and testing datasets iris_test <- iris[1:5, 1:4] iris_train <- iris[-(1:5), 1:4] # Create a vector with the responses for the training dataset iris_lab <- iris[[5]][-(1:5)]
# Create random forest model on iris data library(caret) rf_model <- train(iris_train, iris_lab, method = 'rf') # Can use the complex model to make predictions Pred <- predict(rf_model, iris_test) Actual <- iris[1:5, 5] data.frame(iris_test, Pred, Actual)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Pred Actual ## 1 5.1 3.5 1.4 0.2 setosa setosa ## 2 4.9 3.0 1.4 0.2 setosa setosa ## 3 4.7 3.2 1.3 0.2 setosa setosa ## 4 4.6 3.1 1.5 0.2 setosa setosa ## 5 5.0 3.6 1.4 0.2 setosa setosa
# Create an explainer object library(lime); explainer <- lime(iris_train, rf_model) # Sepal length quantiles obtained from training data explainer$bin_cuts$Sepal.Length
## 0% 25% 50% 75% 100% ## 4.3 5.2 5.8 6.4 7.9
# Probability distribution for sepal length explainer$feature_distribution$Sepal.Length
## ## 1 2 3 4 ## 0.2758621 0.2413793 0.2413793 0.2413793
Histograms of predictor variables from training data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Case.Number setosa ## 1 5.314807 4.04260 5.856493 1.5136405 1 0.000 ## 2 4.546596 2.73312 1.028409 0.4986859 1 1.000 ## 3 5.467831 2.98690 6.565054 0.2349578 1 0.468 ## 4 6.935536 2.38896 5.548015 0.5669754 1 0.468 ## versicolor virginica ## 1 0.234 0.766 ## 2 0.000 0.000 ## 3 0.116 0.416 ## 4 0.082 0.450
We need to determine how similar a sampled case is to the observed case in the testing data
Case 1 from testing data:
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.1 3.5 1.4 0.2
First sample from training data variable distributions associated with case 1 of testing data:
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.314807 4.0426 5.856493 1.513641
LIME uses exponential kernel function \[\pi_{x_{obs}}(x_{sampled}) = exp\left\{\frac{−D(x_{obs}, \ x_{sampled})^2}{σ^2}\right\}\] where
\(x_{obs}\): observed data vector to predict
\(x_{sampled}\): sampled data vector from distribution of training variables
\(D(\cdot \ , \ \cdot)\): distance function such as euclidean distance, cosine distance, etc.
\(\sigma\): width (default set to 0.75 in lime
)
\[\mbox{P(setosa)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\] \[\mbox{P(versicolor)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\] \[\mbox{P(virginica)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\]
lime
supports:
Currently, lime
is programmed to use ridge regression as the “simple” model
If the response is categorical, the user can select how many categories they want to explain
In this example, only setosa will be explained
If petal length and sepal length were selected as the most important features for the first case in the testing data, then the simple model is
\[\mbox{P(Setosa)} \sim \mbox{Petal.Length} + \mbox{Sepal.Length} \]
explain
function in lime
# Explain new observation explanation <- explain(iris_test, explainer, n_labels = 1, n_features = 2, n_permutations = 5000, feature_select = 'auto')
explanation[1:2, 1:6]
## # A tibble: 2 × 6 ## model_type case label label_prob model_r2 model_intercept ## <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 classification 1 setosa 1 0.662 0.128 ## 2 classification 1 setosa 1 0.662 0.128
explanation[1:2, 7:10]
## # A tibble: 2 × 4 ## model_prediction feature feature_value feature_weight ## <dbl> <chr> <dbl> <dbl> ## 1 0.971 Petal.Width 0.2 0.421 ## 2 0.971 Petal.Length 1.4 0.422
explanation[1:2, 11:13]
## # A tibble: 2 × 3 ## feature_desc data prediction ## <chr> <list> <list> ## 1 Petal.Width <= 0.4 <named list [4]> <named list [3]> ## 2 Petal.Length <= 1.6 <named list [4]> <named list [3]>
plot_features(explanation)
Based on these explanations, how is the neural network distinguishing between wolves and huskies?
Response | Without Explanations | With Explanations |
---|---|---|
Trusted the bad model | 10 out of 27 | 3 out of 27 |
Mentioned snow as a potential feature | 12 out of 27 | 25 out of 27 |
Original paper: Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: Explaining the predictions of any classifier. In Knowledge Discovery and Data Mining (KDD), 2016. https://arxiv.org/abs/1602.04938
Informative Video: https://www.youtube.com/watch?v=hUnRCxnydCc
Python Code on Marco’s GitHub: https://github.com/marcotcr/lime
lime
R Package on Thomas Pedersen’s GitHub: https://github.com/thomasp85/lime
lime
Vignette: https://github.com/thomasp85/lime/blob/master/vignettes/Understanding_lime.Rmd