.mediumlarge[Feature Importance with Deep Echo State Models]

.title[
# .mediumlarge[Feature Importance with Deep Echo State Models]
]
.author[
### Katherine Goode, Daniel Ries, Kellie McClernon, and Lyndsay Shand
]
.author[
### .blue-blue[SIAM-GS]
]
.author[
### .blue-blue[June 22, 2023]
]
.author[
### .smaller[.dark-blue[SAND2023-05130C]]
]

---

.pull-left-small{
 width: 35%;
 float: left;
}
.pull-right-large{
 width: 65%;
 float: right;
}
.pull-right-small{
 width: 35%;
 float: right;
}
.pull-left-large{
 width: 65%;
 float: left;
}
.pull-left-smallish{
 width: 46%;
 float: left;
}
.pull-right-largeish{
 width: 53%;
 float: right;
}
.pull-right-smaller{
 width: 20%;
 float: right;
}
.pull-left-larger{
 width: 76%;
 float: left;
}
.pull-left-smaller{
 width: 22%;
 float: left;
}
.pull-right-larger{
 width: 76%;
 float: right;
}
</style>

# Outline

- .blue[Motivation]: Climate Interventions and Pathways

- .blue[Approach]: Echo State Networks and Feature Importance

- .blue[Climate Application]: Mount Pinatubo

- .blue[Conclusions and Future Work]

---

# Motivation

### .bright-teal[Climate Interventions and Pathways]

---

## Climate Interventions

.center[.smallmedium[Image source: [https://eos.org/science-updates/improving-models-for-solar-climate-intervention-research](https://eos.org/science-updates/improving-models-for-solar-climate-intervention-research)]]

]

- Proposed possible interventions

- Stratospheric aerosol injections

- Marine cloud brightening

- Cirrus cloud thinning

- etc.
]

--
.pull-right-small[
.center[**What are the downstream effects of such mitigation strategies?**]
]

---

## Our Objective

Develop algorithms to .blue[characterize (i.e., quantify) relationships between climate variables] related to a climate event (with observed data)

**Climate Pathway** (associated with a climate event)

- Source variable

- Intermediate variables

- Impact variable

]

**Example**

- Mount Pinatubo eruption in 1991

- Released 18-19 Tg of sulfur dioxide

- Proxy for anthropogenic stratospheric aerosol injection

]

---

## Mount Pintabuo Example Pathway

- .medium[Injection of sulfur dioxide (18-19 Tg) into atmosphere [1]]

- .medium[Vertically integrated measure of aerosols in air from surface to stratosphere] .medium[[2]]

- .medium[Temperatures at pressure levels of 30-50 mb rose 2.5-3.5 degrees centigrade compared to 20-year mean [3]]

.smaller[Figure generated using Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA- 2) data [4]]
]

]

---

## Our Approach

Use machine learning...

**Step 1**: Model pathway variables with echo state network

- Allow complex machine learning model to capture complex pathway variable relationships

<img src="figs/esn.png" width="100%" style="display: block; margin: auto;" />
]

**Step 2**: Understand pathways via explainability

- Apply explainability techniques (feature importance) to understand pathways captured by model

<img src="figs/hypothesis.png" width="100%" style="display: block; margin: auto;" />
]

---

# Approach

### .bright-teal[Echo State Networks and Feature Importance]

---

## Echo-State Networks

- Machine learning model for temporal data

- Sibling to recurrent neural network (RNN)

- Computationally efficient

- Compared to RNNs and spatio-temporal statistical models
  
  - ESN reservoir parameters randomly sampled instead of estimated

- Previous work demonstrated use of ESN for long-term spatio-temporal forecasting .medium[(McDermott and Wikle [5])]
  
]

`$$\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t}$$`

`$${\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})$$`

`$$\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)$$`

`$$\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'$$`

Note: Only parameters estimated are in `$\textbf{V}$`.
]

---

## Echo-State Networks: Spatio-Temporal Context

Spatio-temporal processes at spatial locations `$\{\textbf{s}_i\in\mathcal{D}\subset\mathbb{R}^2;i=1,...,N\}$` over times `$t=1,...,T$`...

.pull-left[
.blue[Impact variable] (e.g., stratospheric temperature): 
  
`$${\bf Z}_{Y,t} = \left(Z_{Y,t}({\bf s}_1),Z_{Y,t}({\bf s}_2),...,Z_{Y,t}({\bf s}_N)\right)'$$`

]

.blue[Source/intermediate variables] (e.g., aerosol optical depth):
  
`$${\bf Z}_{k,t} = \left(Z_{k,t}({\bf s}_1),Z_{k,t}({\bf s}_2),...,Z_{k,t}({\bf s}_N)\right)'$$` `$$\mbox{ for } k=1,...,K$$` 
]

| Stage | Formula | Description |
| ----- | ------- | ----------- |
| Data stage (outputs) | `${\bf Z}_{Y,t}\approx\boldsymbol{\Phi}_Y\textbf{y}_{t}$` | Basis function decomposition (e.g., PCA) |
| Output stage | `$\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t}$` | Ridge regression |
| Hidden stage | `$\mathbf{h}_t = g_h \left(\frac{\nu}{\lvert\lambda_w\rvert} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)$` | Nonlinear stochastic transformation |
| Data stage (inputs) | `${\bf Z}_{k,t}\approx\boldsymbol{\Phi}_k\textbf{x}_{k,t} \ \ \ \ \ \mbox{ where } \textbf{x}_t=[\textbf{x}'_{1,t},...,\textbf{x}'_{K,t}]'$` | Basis function decomposition (e.g., PCA) |

---

## Feature Importance

**Goal**

- Feature importance aims to quantify effect of input variable on a model's predictions

**Background**

- Permutation feature importance [6]
- Pixel absence affect with ESNs [7]
- Temporal permutation feature importance [8]

**Our Work**

- Adapt for ESNs in context of spatio-temporal data
]

**In particular...**

Compute feature importance on trained ESN model for:

- .blue[input variable] over .blue[block of times]

- on forecasts of .teal[response variable] at a time

<img src="figs/hypothesis.png" width="100%" style="display: block; margin: auto;" />
]

---

## Feature Importance for ESNs

**Concept**

- "Adjust" inputs at times(s) of interest

- Quantify effect on model performance

- Large decrease in performance indicates important time(s)

**Two Approaches**: "Adjust" inputs by either

- .blue[Permute values]: spatio-temporal permutation feature importance (stPFI)

- .blue[Set values to zero]: spatio-temporal zeroed feature importance (stZFI)

]

<img src="figs/fi.png" width="95%" style="display: block; margin: auto;" />
]

---

# Climate Application

### .bright-teal[Mount Pinatubo]

---

## Mount Pinatubo Example: Data

**Data**

- Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA- 2)

- Training Years: 1980 to 1995 
  
  - Includes eruptions of Mount Pinatubo (1991) and El Chichón (1982)
  
- Time Interval: Monthly

- Latitudes: -86 to 86 degrees

]

---

## Mount Pinatubo Example: Model

**ESN Output**

- Stratospheric Temperature (50mb)

**ESN Inputs**
  
- Lagged Stratospheric Temperature (50mb; one month lag)
- Lagged AOD (one month lag)

**Preprocessing (all variables)**

- Climatologies
- Principal components (first 5)

]

---

## Mount Pinatubo Example: Feature Importance

Peak of importance for AOD (and lack of peak of importance for lagged stratospheric temperatures), provides evidence that volcanic eruption impact on temperature can be traced through AOD

**FI Metric**

Weighted RMSE (weighted by cosine of the latitude)

]

.pull-right-larger[
<img src="figs/merra2_fi.png" width="98%" style="display: block; margin: auto;" />
]

---

# Conclusions and Future Work

---

## Summary and Conclusions

**Summary**

- Interested in quantifying relationships between climate variables associated with pathway of climate event

- Motivated by increasing possibility of climate interventions

- Our machine learning approach:

- Use ESN to model variable relationships

- Understand variable relationships using proposed spatio-temporal feature importance

**Conclusion**

- Approach provided evidence of AOD being an intermediate variable in Mount Pinatubo climate pathway affecting stratospheric temperature

---

## Future (Current) Work

**ESN extensions**

- Addition of multiple layers
- ESN ensembles
- Bayesian ESNs

**Spatio-temporal feature importance**

- Implement proposed retraining technique [9] to lessen detection of spurious relationships
- Adapt to visualize on spatial scale
- Comparison to other newly proposed explainability techniques for ESNs (layer-wise relevance propagation)  [10]

**Mount Pinatubo application**

- Inclusion of additional pathway variables (e.g., SO2, radiative flux, surface temperature)
- Importance of grouped variables

---

## References

.smallmedium[
[1] S. Guo, G. J. Bluth, W. I. Rose, et al. "Re-evaluation of SO$_2$
release of the 15 June 1991 Pinatubo eruption using ultraviolet and
infrared satellite sensors". In: _Geochemistry, Geophysics, Geosystems_
5 (4 2004), pp. 1-31. DOI:
[10.1029/2003GC000654](https://doi.org/10.1029%2F2003GC000654).

[2] M. Sato, J. E. Hansen, M. P. McCormick, et al. "Stratospheric
aerosol optical depths, 1850-1990". In: _Journal of Geophysical
Research: Atmospheres_ 98.D12 (1993), pp. 22987-22994. DOI:
[https://doi.org/10.1029/93JD02553](https://doi.org/https%3A%2F%2Fdoi.org%2F10.1029%2F93JD02553).
eprint:
https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/93JD02553. URL:
[https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/93JD02553](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/93JD02553).

[3] K. Labitzke and M. McCormick. "Stratospheric temperature increases
due to Pinatubo aerosols". In: _Geophysical Research Letters_ 19 (2
1992), pp. 207-210. DOI:
[10.1029/91GL02940](https://doi.org/10.1029%2F91GL02940).

[4] R. Gelaro, W. McCarty, M. J. Suarez, et al. "The ModernEra
Retrospective Analysis for Research and Applications, Version 2
(MERRA-2)". In: _Journal of Climate_ 30 (14 2017), pp. 5419-5454. DOI:
[10.1175/JCLI-D-16-0758.1](https://doi.org/10.1175%2FJCLI-D-16-0758.1).

[5] P. L. McDermott and C. K. Wikle. "Deep echo state networks with
uncertainty quantification for spatio‐temporal forecasting". In:
_Environmetrics_ 30.3 (2019). ISSN: 1180-4009. DOI:
[10.1002/env.2553](https://doi.org/10.1002%2Fenv.2553).

[6] A. Fisher, C. Rudin, and F. Dominici. "All Models are Wrong, but
Many are Useful: Learning a Variable's Importance by Studying an Entire
Class of Prediction Models Simultaneously". In: _Journal of Machine
Learning Research_. 177 20 (2019), pp. 1-81. eprint: 1801.01489. URL:
[http://jmlr.org/papers/v20/18-760.html](http://jmlr.org/papers/v20/18-760.html).

[7] A. B. Arrieta, S. Gil-Lopez, I. Laña, et al. "On the post-hoc
explainability of deep echo state networks for time series forecasting,
image and video classification". In: _Neural Computing and
Applications_ 34.13 (2022), pp. 10257-10277. ISSN: 0941-0643. DOI:
[10.1007/s00521-021-06359-y](https://doi.org/10.1007%2Fs00521-021-06359-y).

[8] A. Sood and M. Craven. "Feature Importance Explanations for
Temporal Black-Box Models". In: _arXiv_ (2021). DOI:
[10.48550/arxiv.2102.11934](https://doi.org/10.48550%2Farxiv.2102.11934).
eprint: 2102.11934.

[9] G. Hooker, L. Mentch, and S. Zhou. "Unrestricted permutation forces
extrapolation: variable importance requires at least one more model, or
there is no free variable importance". In: _Statistics and Computing_
31 (2021), pp. 1-16.

[10] M. Landt-Hayen, P. Kröger, M. Claus, et al. "Layer-Wise Relevance
Propagation for Echo State Networks Applied to Earth System
Variability". In: _Signal, Image Processing and Embedded Systems
Trends_. Ed. by D. C. Wyld. Computer Science & Information Technology
(CS & IT): Conference Proceedings 20. ARRAY(0x55588c8d8680), 2022, pp.
115-130. ISBN: 978-1-925953-80-0. DOI:
[doi:10.5121/csit.2022.122008](https://doi.org/doi%3A10.5121%2Fcsit.2022.122008).
URL:
[https://doi.org/10.5121/csit.2022.122008](https://doi.org/10.5121/csit.2022.122008).
]

---

# Thank you