class: center, middle, inverse, title-slide .title[ #
An Application of LIME to a Random Forest ] .author[ ###
Katherine Goode
] .date[ ###
ISU Graphics Group - March 1, 2019
] --- # Overview .pull-left[ ### The plan 1. Explanation of LIME 2. Hamby bullet data 3. Applying LIME to the random forest 4. Issues and attempts at a solution 6. Conclusions and future work ] .pull-right[ <br> <br> <img src="./figures/lime_drawing.png" width = 400> ] --- class: inverse, center, middle # What is <span style="color:lime">LIME</span>? --- # Motivation for LIME <br> ![](./figures/blackbox.png) ### Black Box Prediction Models - Offer great predictive ability - Loss of interpretability - Difficult to assess trustworthiness ### Enter LIME... - **L**ocal **I**nterpretable **M**odel-Agnostic **E**xplanations - Developed by computer scientists ([Ribeiro, Singh, and Guestrin](https://arxiv.org/pdf/1602.04938.pdf)) - Designed to assess if a black box predictive model is trustworthy - Produces "explanations" for individual predictions --- # Meaning of LIME .pull-left[ ### <span style="color:lime">L</span>ocal - Focuses on behavior of a complex model at a local level ### <span style="color:lime">I</span>nterpretable - Produces easily interpretable "explanations" ### <span style="color:lime">M</span>odel-Agnostic - Works with any predictive model ### <span style="color:lime">E</span>xplanations - Provides insight into individual predictions ] .pull-right[ <br> <br> .center[<img src="./figures/local.png" width=300>] .center[<font size="4">Figure 3 in Ribeiro, Singh, and Guestrin (2016)</font>] <br> <br> .center[<img src="./figures/image_explanation.png" width=500>] .center[<font size="4">Figure 4 in Ribeiro, Singh, and Guestrin (2016)</font>] ] --- # An Example <font size="3">(from Ribeiro, Singh, and Guestrin (2016))</font> .pull-left[ ### 1. Black Box Model - Model predicts whether a patient has the flu - Apply the model to a new patient - Predicts that the patient has the flu - Can this prediction be trusted? ] .pull-right[ ### 2. LIME - Apply LIME to *this* case - LIME returns the most important variables in *this* prediction - Colors indicate + <span style="color:green">green</span>: evidence supporting the flu + <span style="color:red">red</span>: evidence against the flu - Can this prediction be trusted? ] .center[<img src="./figures/example.png" width=700>] .center[<font size="4">Figure 1 in Ribeiro, Singh, and Guestrin (2016)</font>] --- # General LIME Procedure **Start with**: (1) training data, (2) complex model, (3) one prediction from the testing data 1. Create perturbations from training data 2. Input each perturbation into complex model to get predictions 3. Compute similarity scores between test observation and each perturbation 4. Perform feature selection to choose `\(k\)` features 5. Fit a simple and interpretable model weighted by the similarity scores to the perturbations `$$\mbox{Example: } \mbox{prediction} \sim \mbox{feature 2} + \mbox{feature 3} \ \ \ \mbox{(with features standardized)}$$` 6. Interpret the simple model to "explain" the predictions | | Feature 1 | <span style="color:blue">Feature 2</span> | <span style="color:blue">Feature 3</span> | Feature 4 | Similarity Score | <span style="color:blue">Prediction</span>| | --: | :--: | :--: | :--: | :--: | :--: | :--: | | test observation | 2 | 4 | 2 | 10 | exact | `\(\hat{y}_{\mbox{obs}}\)` | | perturbation 1 | 2 | 3 | 2 | 12 | very close | `\(\hat{y}_1\)` | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | perturbation 5000 | 10 | 0 | 13 | 22 | not close | `\(\hat{y}_{5000}\)` | --- # Using LIME ### Implementations of LIME - Developers created a Python package called [lime](https://github.com/marcotcr/lime) - Thomas Pedersen created an R package also called [lime](https://github.com/thomasp85/lime) ![](./figures/lime.png) ### Key Functions in the lime R package ```r #install.packages("lime") library(lime) ?lime ?explain ``` --- class: inverse, center, middle # Hamby Bullet Matching Data --- # Hamby Bullet Study [<font size="3">James E. Hamby et. al. (2009)</font>](https://cdn2.hubspot.net/hub/71705/file-15668427-pdf/docs/aftespringvol41no2pages99-110.pdf) - Sets of bullets from both “known” and “unknown” gun barrels were sent to firearm examiners around the world - 240 total sets created - 10 barrels used to create a set - 35 bullets in a set - 20 knowns (2 from the same barrel) - 15 unknowns (at least 1 and no more than 3 from each barrel) - Examiners asked to use the known bullets to identify which barrels the unknown bullets were fired from .center[<img src="./figures/hamby.png" width=350>] .center[<font size="4">Results table from Hamby et. al. (2009)</font>] --- # Automated Bullet Matching Algorithm - CSAFE has access to some of the bullet sets - [Hare, Hofmann, and Carriquiry ](https://projecteuclid.org/euclid.aoas/1514430288)(2017) develop an automated algorithm to determine whether two bullets are a match - High definition scans of bullets used to obtain signatures associated with each land - Developed variables that measure how similar two signatures are - Fit a random forest model to predict whether two bullets were a match .center[<img src="./figures/signatures.png" width=575>] .center[<font size="4">Figure from Hare, Hofmann, and Carriquiry (2017)</font>] --- # Training Data #### Sets 173 and 252 ```r hamby173and252_train <- read.csv("./data/hamby173and252_train.csv") # first 5 rows ``` #### Variables Identifying the Bullets ```r hamby173and252_train %>% select(1:8) %>% slice(1:5) ``` ``` ## study1 barrel1 bullet1 land1 study2 barrel2 bullet2 land2 ## 1 Hamby173 10 1 1 Hamby173 10 1 2 ## 2 Hamby173 10 1 1 Hamby173 10 1 3 ## 3 Hamby173 10 1 1 Hamby173 10 1 4 ## 4 Hamby173 10 1 1 Hamby173 10 1 5 ## 5 Hamby173 10 1 1 Hamby173 10 1 6 ``` --- # Training Data #### Signature Similarity Features ```r hamby173and252_train %>% select(9:17) %>% slice(1:5) %>% round(2) ``` ``` ## ccf rough_cor D sd_D matches mismatches cms non_cms sum_peaks ## 1 0.19 -0.03 2.93 2.64 3.21 7.48 1.60 3.74 4.21 ## 2 0.26 0.18 1.43 1.88 3.37 9.54 1.68 2.80 3.63 ## 3 0.23 0.08 2.49 2.57 2.37 12.43 2.37 9.47 2.30 ## 4 0.35 0.17 2.29 2.68 2.66 9.56 1.59 5.31 3.92 ## 5 0.21 0.04 2.66 2.57 1.13 10.69 1.13 5.63 0.45 ``` #### Response and RF Score ```r hamby173and252_train %>% select(18:19) %>% slice(1:5) ``` ``` ## samesource rfscore ## 1 FALSE 0.1000000 ## 2 FALSE 0.2300000 ## 3 FALSE 0.3433333 ## 4 FALSE 0.3100000 ## 5 FALSE 0.3100000 ``` --- # Random Forest Model The random forest model fit to the training data in Hare, Hofmann, and Carriquiry can be found in the bulletxtrctr R package. It is called `rtrees`. .pull-left[ ```r # Load the bulletxtrctr library library(bulletxtrctr) ``` ```r # Importance values from # the random forest rtrees$importance ``` ``` ## MeanDecreaseGini ## ccf 567.53791 ## rough_cor 666.62447 ## D 164.76236 ## sd_D 101.73478 ## matches 305.13778 ## mismatches 261.84430 ## cms 94.96370 ## non_cms 96.75017 ## sum_peaks 120.49836 ``` ] .pull-right[ ```r # Number of trees rtrees$ntree ``` ``` ## [1] 300 ``` ```r # Number of variables randomly # sampled at each split rtrees$mtry ``` ``` ## [1] 3 ``` ```r # Confusion matrix rtrees$confusion ``` ``` ## FALSE TRUE class.error ## FALSE 81799 21 0.000256661 ## TRUE 363 845 0.300496689 ``` ] --- # Testing Data ```r hamby224_test <- read.csv("./data/hamby224_test.csv") # first 5 rows glimpse(hamby224_test) ``` ``` ## Rows: 432 ## Columns: 18 ## $ case <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, … ## $ study <chr> "Hamby 224", "Hamby 224", "Hamby 224", "Hamby 224", "Hamby … ## $ set <chr> "Set 11", "Set 11", "Set 11", "Set 11", "Set 1", "Set 1", "… ## $ bullet1 <chr> "Known 1", "Known 1", "Known 1", "Known 1", "Known 1", "Kno… ## $ land1 <chr> "Land 4", "Land 3", "Land 1", "Land 4", "Land 6", "Land 3",… ## $ bullet2 <chr> "Known 1", "Known 1", "Known 2", "Known 2", "Known 2", "Kno… ## $ land2 <chr> "Land 3", "Land 4", "Land 6", "Land 6", "Land 4", "Land 2",… ## $ ccf <dbl> 0.2704221, 0.2704221, 0.2788290, 0.3146965, 0.2251567, 0.34… ## $ rough_cor <dbl> 0.2704221, 0.2704221, 0.2788290, 0.3146965, 0.2251567, 0.34… ## $ D <dbl> 0.0013963668, 0.0013963668, 0.0015058188, 0.0015339433, 0.0… ## $ sd_D <dbl> 0.002090586, 0.002090586, 0.002059285, 0.002235877, 0.00209… ## $ matches <dbl> 0.5901742, 0.5901742, 0.5320479, 0.5304097, 0.4600557, 0.51… ## $ mismatches <dbl> 7.616146, 7.616146, 7.895640, 8.213552, 7.123776, 5.958292,… ## $ cms <dbl> 0.5901742, 0.5901742, 0.5320479, 0.5304097, 0.4600557, 0.51… ## $ non_cms <dbl> 3.808073, 3.808073, 4.462753, 4.449008, 5.936480, 3.972195,… ## $ sum_peaks <dbl> 1.4632093, 1.4632093, 1.2246315, 2.3406954, 0.6428395, 1.36… ## $ samesource <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL… ## $ rfscore <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.02000000,… ``` --- # Testing Data #### Results from Applying `rtrees` to Hamby 224 (Sets 1 and 11) <br>
Truth
RF_Prediction
Count
FALSE
FALSE
275
FALSE
TRUE
45
TRUE
FALSE
3
TRUE
TRUE
41
<br> - Want to determine the driving variables for individual predictions - Especially helpful when the model is wrong - Variable importance scores provide global interpretations - We want local interpretations --- class: inverse, center, middle # Applying <span style="color:lime">LIME</span> to the Random Forest --- # Step 0: Starting Objects **Training Data**: `hamby173and252_train` **Complex Model**: `rtrees` **Case of Interest**: case 1 from `hamby224_test` <br>
case
study
set
bullet1
land1
bullet2
land2
1
Hamby 224
Set 11
Known 1
Land 4
Known 1
Land 3
<br>
ccf
rough_cor
D
sd_D
matches
mismatches
cms
non_cms
sum_peaks
samesource
rfscore
0.27
0.27
0
0
0.59
7.62
0.59
3.81
1.46
0
0
--- # Step 1: Create Perturbations #### Estimate the feature distributions .pull-left[ - quantile bins (default with 4 bins) - equally spaced bins ] .pull-right[ - normal approximation - kernel density estimation ] .center[ ![](slides_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] .center[<font size="4">Histogram of CCF from training data with LIME quantile bins (blue line is ccf value of case 1)</font>] --- # Step 1: Create Perturbations #### Draw many samples from the estimated feature distributions - If a binning estimation method is used, numeric features are converted to indicator variables. - 1 if in same bin as case 1 - 0 o.w. | | ccf same bin? | rough_cor same bin? | `\(\cdots\)` | sum_peaks same bin?| | --: | :--: | :--: | :--: | :--: | | case 1 | 1 | 1 | `\(\cdots\)` | 1 | | perturbation 1 | 0 | 1 | `\(\cdots\)` | 1 | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | perturbation 5000 | 0 | 0 | `\(\cdots\)` | 1| - Otherwise, features are left as is. --- # Step 2: Obtain Predictions #### Use the complex model to obtain a prediction for each perturbation - Use the `rtrees` model to do this <br> | | ccf same bin? | rough_cor same bin? | `\(\cdots\)` | sum_peaks same bin?| prediction | | --: | :--: | :--: | :--: | :--: | :--: | | case 1 | 1 | 1 | `\(\cdots\)` | 1 | not a match | | perturbation 1 | 0 | 1 | `\(\cdots\)` | 1 | match | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | perturbation 5000 | 0 | 0 | `\(\cdots\)` | 1 | not a match | --- # Step 3: Similarity Score #### Compute a similiarity score between the test observation and each perturbation. - Default method in R package is Gower distance - between 0 and 1 - 0 indicates exactly the same - Other distance metrics can be specified | | ccf same bin? | rough_cor same bin? | `\(\cdots\)` | sum_peaks same bin?| distance | prediction | | --: | :--: | :--: | :--: | :--: | :--: | | case 1 | 1 | 1 | `\(\cdots\)` | 1 | 0 | not a match | | perturbation 1 | 0 | 1 | `\(\cdots\)` | 1 | 0.1 | match | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | perturbation 5000 | 0 | 0 | `\(\cdots\)` | 1 | 0.8 | not a match | --- # Step 4: Feature Selection #### Perform feature selection to choose `\(k\)` features. - Can specify the number of features to select - We have been choosing 3 features - Options in lime - auto: forward selection if `\(k\le6\)`, highest weight otherwise (default) - forward selection with ridge regression - highest weight with ridge regression - LASSO - tree models - We have been using the default method <br> `$$\hat{\mbox{rf score}}=\hat{\beta_0} + \hat{\beta_1} I[\mbox{ccf in same bin}] + \hat{\beta_2} I[\mbox{rough_cor in same bin}] +\cdots + \hat{\beta_9} I[\mbox{sum_peaks in same bin}]$$` --- # Step 5: Simple Model #### Fit a linear regression (weighted by the similarity scores) to the perturbations with the 3 chosen features. - Features are standardized - Currently, lime uses ridge regression as the "simple" model - If the response is categorical, the user can select how many categories they want to explain - We have only been explaining "TRUE" - Example model: <br> `$$\hat{\mbox{rf score}}=\hat{\beta_0} + \hat{\beta_1} I[\mbox{ccf in same bin}] + \hat{\beta_2} I[\mbox{cms in same bin}] + \hat{\beta_3} I[\mbox{matches in same bin}]$$` --- # Step 6: Interpret #### Use the coefficients from the regression to "explain" the predictions. - Since features are standardized, we can compare the coefficients - Largest `\(\hat{\beta}_i\)` is considered the most important --- # R Code ## lime Function The `lime` function estimates the feature distributions (first part of step 1). ```r set.seed(20190229) out_lime <- lime::lime(x = hamby173and252_train %>% select(ccf:sum_peaks), model = lime::as_classifier(rtrees)) out_lime$bin_cuts$ccf ``` ``` ## 0% 25% 50% 75% 100% ## 0.01406705 0.21271265 0.27499511 0.34783793 0.98110386 ``` ```r out_lime$feature_distribution$ccf ``` ``` ## ## 1 2 3 4 ## 0.25 0.25 0.25 0.25 ``` --- # R Code ## explain Function The `explain` function performs the rest of the steps. ```r out_explain <- lime::explain(x = hamby224_test %>% filter(case == 1) %>% select(ccf:sum_peaks), explainer = out_lime, labels = TRUE, n_features = 3) out_explain[,1:6] ``` ``` ## # A tibble: 3 × 6 ## model_type case label label_prob model_r2 model_intercept ## <chr> <chr> <lgl> <dbl> <dbl> <dbl> ## 1 classification 1 TRUE 0 0.0460 0.700 ## 2 classification 1 TRUE 0 0.0460 0.700 ## 3 classification 1 TRUE 0 0.0460 0.700 ``` ```r out_explain[,7:10] ``` ``` ## # A tibble: 3 × 4 ## model_prediction feature feature_value feature_weight ## <dbl> <chr> <dbl> <dbl> ## 1 0.661 rough_cor 0.270 -0.0442 ## 2 0.661 ccf 0.270 0.0393 ## 3 0.661 sum_peaks 1.46 -0.0341 ``` --- # R Code ## R Package Plot of Explanations The function `plot_features` let you visualize the explanations. ```r lime::plot_features(out_explain) ``` ![](slides_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- # Shiny App <br> .center[<img src="./figures/app.png" width=1100>] --- class: inverse, center, middle # Issues with <span style="color:lime">LIME</span> and Attempts at a Solution --- # Initial Concerns We first applied lime using all default settings. - Explanations did not make sense - `\(R^2\)` values were very low - Predictions from the simple model were poor .center[<img src="./figures/explain1.png" width=450>] .center[<font size="4">Example where the random forest prediction is wrong. The explanations are from 4 quantile bins.</font>] --- # More Issues Then we tried more of the feature distribution estimation methods - Explanations still not great - `\(R^2\)` values still very low - Predictions from the simple model still poor - Explanations dependent on the input .center[<img src="./figures/explain2.png" width=450>] .center[<font size="4">This is the same case with explanations from 3 equally spaced bins.</font>] --- # New Binning Method Next we created our own way to select bins using trees .pull-left[ ```r # Example using treebink treebink( y = hamby173and252_train$samesource, x = hamby173and252_train$ccf, k = 5 ) ``` ``` ## [1] 0.493562 0.582999 0.688211 0.778397 ``` ] .pull-right[ .center[<img src="./figures/tree.png" height = 500>] ] --- # Comparing Methods #### Feature distribution estimaton methods: - 2 to 5 quantile bins - 2 to 5 equally spaced bins - 2 to 5 tree based bins with `samesoure` as the response - 2 to 5 tree based bins with `rfscore` as the response - normal approximation - kernel density estimation #### Comparison metrics: - MSEs (rf prediction vs simple model predictions) - `\(R^2\)` - Consistency across number of bins - Consistency across 10 reps --- # Comparing MSEs .center[<img src="./figures/mse.png" width = 900>] --- # Comparing `\(R^2\)` Values .center[<img src="./figures/r2.png" width = 900>] --- # Comparing Consistency Across Bins .center[<img src="./figures/firstfeature.png" width = 1100>] --- # Comparing Consistency Across Bins .center[<img src="./figures/secondfeature.png" width = 1100>] --- # Comparing Consistency Across Bins .center[<img src="./figures/thirdfeature.png" width = 1100>] --- # Comparing Consistency Across Reps .center[<img src="./figures/reps1.png" width = 1100>] --- # Comparing Consistency Across Reps .center[<img src="./figures/reps2.png" width = 1100>] --- # Comparing Consistency Across Reps .center[<img src="./figures/reps3.png" width = 1100>] --- class: middle, center, inverse # Conclusions and Future Work --- # Conclusions - Equally spaced bins are dependent on the number of bins - Tree based bins usually have the lowest MSEs - `rfscore` tree based bins usually have the highest `\(R^2\)` for the binning options - All options results in low `\(R^2\)` values - We think that the linear regression is not similar enough to a random forest to produce good explanations - Explanations are relatively consistent across reps for first and second features - Non-binning methods produce very different results compared to the binning methods --- # Future Work - Apply LIME to a logistic regression model - We know how to interpret a logistic regression - Can compare to LIME explanations - Run some sort of simulation to assess LIME explanations - Adjust and compare weighting methods and the feature selection methods - Allow the number of bins to vary across features - Use a tree as the simple model - Find other ways to interpret random forest models --- # Image Credits - slide 1: https://www.flickr.com/photos/lincolnian/300262799 - slide 2: https://ayoqq.org/explore/lime-drawing/ - slide 3: https://en.wikipedia.org/wiki/Black_box - slide 8: https://github.com/thomasp85/lime