class: center, middle, inverse, title-slide .title[ # .mediumlarge[Characterizing climate pathways using feature importance on echo state networks] ] .author[ ### Katherine Goode ] .author[ ### Daniel Ries, Kellie McClernon ] .author[ ### WNAR June 11, 2024 ] .author[ ### .smaller[SAND2023-10060C] ] --- <style type="text/css"> .smaller{font-size: 45%} .small{font-size: 50%} .smallmedium{font-size: 55%} .medium{font-size: 75%} .mediumlarge{font-size: 80%} .pull-left-small{ width: 35%; float: left; } .pull-right-large{ width: 60%; float: right; } .pull-right-small{ width: 35%; float: right; } .pull-left-large{ width: 65%; float: left; } .pull-left-smallish{ width: 46%; float: left; } .pull-right-largeish{ width: 53%; float: right; } .pull-right-smaller{ width: 20%; float: right; } .pull-left-larger{ width: 76%; float: left; } .pull-left-smaller{ width: 22%; float: left; } .pull-right-larger{ width: 76%; float: right; } </style> ## Climate Interventions .pull-left-large[ <img src="figs/solar-climate-interventions.png" width="90%" style="display: block; margin: auto;" /> .center[.smallmedium[Image source: [https://eos.org/science-updates/improving-models-for-solar-climate-intervention-research](https://eos.org/science-updates/improving-models-for-solar-climate-intervention-research)]] ] .pull-right-small[ Threat of climate change has led to .blue[proposed interventions]... - Stratospheric aerosol injections - Marine cloud brightening - Cirrus cloud thinning - etc. ] -- <br> .center[**What are the downstream effects of such mitigation strategies?**] --- ## CLDERA Grand Challenge <img src="figs/cldera.png" width="100%" style="display: block; margin: auto;" /> .center[.smallmedium[Image credit: CLDERA leadership]] --- ## Climate Event Exemplar - Mount Pinatubo eruption in 1991 - Released 18-19 Tg of sulfur dioxide - Proxy for anthropogenic stratospheric aerosol injection .pull-left[ <img src="figs/pinatubo.jpg" width="95%" style="display: block; margin: auto;" /> <br> <br> .center[.smallmedium[Image source: [https://volcano.si.edu/volcano.cfm?vn=273083](https://volcano.si.edu/volcano.cfm?vn=273083)]] ] .pull-right[ <img src="figs/pinatubo-ash-cloud.jpg" width="95%" style="display: block; margin: auto;" /> .center[.smallmedium[Image source: [https://pubs.usgs.gov/fs/1997/fs113-97/](https://pubs.usgs.gov/fs/1997/fs113-97/)]] ] --- ## Observational Thrust **Objective**: Develop algorithms to .blue[characterize (i.e., quantify) relationships between climate variables] related to a climate event using observational data .pull-left-smaller[ <img src="figs/obs-data-collection.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right-larger[ <img src="figs/obs-pathway.png" width="100%" style="display: block; margin: auto;" /> <br> .center[.smallmedium[Image credit: CLDERA leadership]] ] --- ## Mount Pintabuo Pathway .center[ .blue[Sulfur dioxide] .medium[Injection of sulfur dioxide (18-19 Tg) into atmosphere [1]] <img src="figs/arrow.png" width="5%" style="display: block; margin: auto;" /> .blue[Aerosol optical depth (AOD)] .medium[Vertically integrated measure of aerosols in air from surface to stratosphere [2] AOD increased as a result of injection of sulfur dioxide [1; 2]] <img src="figs/arrow.png" width="5%" style="display: block; margin: auto;" /> .blue[Stratospheric temperature] .medium[Temperatures at pressure levels of 30-50 mb rose 2.5-3.5 degrees centigrade compared to 20-year mean [3]] ] --- ## Mount Pintabuo Pathway <img src="figs/merra2_heatmaps_1991.png" width="77%" style="display: block; margin: auto;" /> .center[.smaller[Figure generated using Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA- 2) data [4].]] --- ## Our Approach -- Use machine learning... -- .pull-left[ **Step 1: Model climate event variables with echo state network** Allow complex machine learning model to capture complex variable relationships <br> <img src="figs/esn-step1.png" width="95%" style="display: block; margin: auto;" /> ] -- .pull-right[ **Step 2: Quantify relationships via explainability** Apply feature importance to understand relationships captured by model <br> <img src="figs/esn-step2.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Remainder of Talk... <br> - .blue[Approach]: Echo State Networks and Feature Importance <br> - .blue[Climate Application]: Mount Pinatubo <br> - .blue[Conclusions and Future/Current Work] --- class: inverse, center, middle # Approach ### .bright-teal[Step 1: Echo State Networks] --- ## Echo-State Networks .pull-left[ **Overview** - Nonlinear machine learning model for temporal data - Sibling to recurrent neural network (RNN) - Known for forecasting with chaotic systems - Computationally efficient model - Compared to RNNs and spatio-temporal statistical models - ESN reservoir parameters randomly sampled instead of estimated ] -- .pull-right[ **Recent work** ESN for spatio-temporal forecasting: - Sea surface temperature: .medium[McDermott and Wikle [5]] - Soil moisture: .medium[McDermott and Wikle [6]] - Wind power: .medium[Huang, Castruccio, and Genton [7]] - Air pollution: .medium[Bonas and Castruccio [8]] ] --- ## Echo-State Networks **Single-Layer Echo State Network** .pull-left[ .blue[Output stage]: ridge regression `$$\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t} \ \ \ \ \ \ {\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})$$` <br> .blue[Hidden stage]: nonlinear stochastic transformation `$$\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)$$` `$$\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'$$` ] -- .pull-right[ Only parameters estimated are in `\(\textbf{V}\)`. Elements of `\(\textbf{W}\)` and `\(\textbf{U}\)` randomly sampled... `\begin{align} \textbf{W}[h,c_w] &=\gamma_{h,c_w}^w\mbox{Unif}(-a_w,a_w)+(1-\gamma_{h,c_w}^w)\delta_0,\\ \textbf{U}[h,c_u] &=\gamma_{h,c_u}^u\mbox{Unif}(-a_u,a_u)+(1-\gamma_{h,c_u}^u)\delta_0, \end{align}` where - `\(\gamma_{h,c_w}^w \sim Bern(\pi_w)\)` - `\(\gamma_{h,c_u}^u \sim Bern(\pi_u)\)` - `\(\delta_0\)` is a Dirac function and values of `\(a_w\)`, `\(a_u\)`, `\(\pi_w\)`, and `\(\pi_u\)` are pre-specified and set to small values. ] --- ## Echo-State Networks <br> <br> <img src="figs/esn-step1.png" width="60%" style="display: block; margin: auto;" /> --- ## Echo-State Networks: Spatio-Temporal Context Recall that we are working with spatio-temporal data... <img src="figs/esn-spatio-temp.png" width="95%" style="display: block; margin: auto;" /> --- ## Echo-State Networks: Spatio-Temporal Context -- Spatio-temporal processes at spatial locations `\(\{\textbf{s}_i\in\mathcal{D}\subset\mathbb{R}^2;i=1,...,N\}\)` over times `\(t=1,...,T\)`... -- .pull-left[ .blue[Output variable] (e.g., stratospheric temperature): `$${\bf Z}_{Y,t} = \left(Z_{Y,t}({\bf s}_1),Z_{Y,t}({\bf s}_2),...,Z_{Y,t}({\bf s}_N)\right)'$$` ] -- .pull-right[ .blue[Input variables] (e.g., lagged aerosol optical depth and stratospheric temperature): For `\(k=1,...,K\)` `$${\bf Z}_{k,t} = \left(Z_{k,t}({\bf s}_1),Z_{k,t}({\bf s}_2),...,Z_{k,t}({\bf s}_N)\right)'$$` ] -- | Stage | Formula | Description | | ----- | ------- | ----------- | | Output data stage | `\({\bf Z}_{Y,t}\approx\boldsymbol{\Phi}_Y\textbf{y}_{t}\)` | Basis function decomposition (e.g., PCA) | | Output stage | `\(\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t}\)` | Ridge regression | | Hidden stage | `\(\mathbf{h}_t = g_h \left(\frac{\nu}{\lvert\lambda_w\rvert} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)\)` `\(\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'\)` | Nonlinear stochastic transformation | | Input data stage | `\({\bf Z}_{k,t}\approx\boldsymbol{\Phi}_k\textbf{x}_{k,t}\)` `\(\ \ \ \ \ \ \textbf{x}_t=[\textbf{x}'_{1,t},...,\textbf{x}'_{K,t}]'\)` | Basis function decomposition (e.g., PCA) | --- ## Echo-State Networks: Spatio-Temporal Context <br> <img src="figs/esn-with-pcs.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Approach ### .bright-teal[Step 2: Feature Importance] --- ## Feature Importance for ESNs -- **Feature importance**: Aims to quantify effect of input variable on a model's predictions - Permutation feature importance [9], pixel absence affect with ESNs [10], temporal permutation feature importance [11] -- **Our Goal**: Quantify the effect of .blue[input variable] `\(k\)` over .blue[block of times] `\((t-b,...,t-1)\)` on .blue[forecasts] at time `\(t\)` <img src="figs/esn-with-pcs.png" width="85%" style="display: block; margin: auto;" /> --- ## Feature Importance for ESNs -- .pull-left-large[ <img src="figs/fi.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right-small[ **Concept**: Set inputs at times(s) of interest to zero and quantify effect on model performance (.blue[ spatio-temporal zeroed feature importance (stZFI)]) ] -- **Feature Importance**: Difference in RMSEs from "zeroed" and observed spatial predictions: `$$RMSE_{zeroed,t}-RMSE_{obs,t}$$` -- **Interpretation**: Large feature importance indicates "zeroed" inputs lead to a decrease in model performance indicating the model uses those inputs for prediction (i.e., inputs 'important' to model) --- ## Feature Importance for ESNs Computing feature importance of variable 1 at time `\(t=5\)` using a block size of 3: <br> `$$RMSE_{adj,5}-RMSE_{obs,5}$$` <br> <img src="figs/fi-demo.png" width="90%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Climate Application ### .bright-teal[Mount Pinatubo] --- ## Mount Pinatubo Example: Data **Data** Monthly values from 1980 to 1995 and -86 to 86 degrees latitude **Source** Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA- 2) .pull-left-smaller[ **Normalized anomalies** `$$\frac{Z_{k,mth,yr}({\bf s}_i)-\bar{Z}_{k,mth,\cdot}({\bf s}_i)}{sd(Z_{k,mth,\cdot}({\bf s}_i))}$$` <br> where `\(\bar{Z}_{k,mth,\cdot}({\bf s}_i)\)` and `\(sd(Z_{k,mth,\cdot}({\bf s}_i))\)` are average and standard deviation at location `\({\bf s}_i\)` during month `\(mth\)` for variable `\(k\)` ] .pull-right-larger[ <img src="figs/merra2_weighted_global_averages.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Mount Pinatubo Example: Model .pull-left-small[ **Goal** Forecast stratospheric temperature (50mb) one month ahead given lagged values of stratospheric temperature (50mb) and AOD **Details** - All variables: First 5 principal components computed on normalized anomalies - Hyperparameters selected via grid search when years of 1994 and 1995 were held out - Fit 25 ESNs to account for random variability ] .pull-right-large[ <img src="figs/merra2_rmse.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Mount Pinatubo Example: Feature Importance **Block size** 6 months (e.g., How important are February, March, April, May, June, and July in 1991 for forecasting August in 1991?) **Metric** Weighted RMSE to compute feature importance (weighted by cos latitude): `$$\mbox{weighted RMSE}_t =\sqrt{\frac{\sum_{loc}w_{loc}(y_{t,loc}-\hat{y}_{t,loc,adj})^2}{\sum_{loc}w_{loc}}} - \sqrt{\frac{\sum_{loc}w_{loc}(y_{t,loc}-\hat{y}_{t,loc})^2}{\sum_{loc}w_{loc}}}$$` <img src="figs/merra2_fi_rmses.png" width="70%" style="display: block; margin: auto;" /> --- ## Mount Pinatubo Example: Feature Importance Block size of 6 months (including variability across 25 ESNs) <img src="figs/merra2_fi.png" width="100%" style="display: block; margin: auto;" /> --- ## Mount Pinatubo Example: Feature Importance Block sizes of 1-6 months (averaged over 25 ESNs) <img src="figs/merra2_fi_blocks.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Conclusions and Future Work --- ## Summary and Conclusions -- **Key Take Aways** - Interested in quantifying relationships between climate variables associated with pathway of climate event - Motivated by increasing possibility of climate interventions - Our machine learning approach: - Use ESN to model variable relationships - Understand variable relationships using proposed spatio-temporal feature importance - Approach provided evidence of AOD being an intermediate variable in Mount Pinatubo climate pathway affecting stratospheric temperature --- ## Future (Current) Work -- **ESN extensions** - ESN ensembles - Addition of multiple layers -- **Feature importance** - Implement proposed retraining technique [12] to lessen detection of spurious relationships due to correlation - Adapt to visualize on spatial scale - Comparison to other newly proposed explainability techniques for ESNs (layer-wise relevance propagation) [13] -- **Mount Pinatubo application** - Inclusion of additional pathway variables (e.g., SO2, radiative flux, surface temperature) - Importance of grouped variables - Application to climate simulation ensembles - Use of climate simulation ensembles for method assessment --- ## References .smallmedium[ [1] S. Guo, G. J. Bluth, W. I. Rose, et al. "Re-evaluation of SO$_2$ release of the 15 June 1991 Pinatubo eruption using ultraviolet and infrared satellite sensors". In: _Geochemistry, Geophysics, Geosystems_ 5 (4 2004), pp. 1-31. DOI: [10.1029/2003GC000654](https://doi.org/10.1029%2F2003GC000654). [2] M. Sato, J. E. Hansen, M. P. McCormick, et al. "Stratospheric aerosol optical depths, 1850-1990". In: _Journal of Geophysical Research: Atmospheres_ 98.D12 (1993), pp. 22987-22994. DOI: [https://doi.org/10.1029/93JD02553](https://doi.org/https%3A%2F%2Fdoi.org%2F10.1029%2F93JD02553). eprint: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/93JD02553. URL: [https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/93JD02553](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/93JD02553). [3] K. Labitzke and M. McCormick. "Stratospheric temperature increases due to Pinatubo aerosols". In: _Geophysical Research Letters_ 19 (2 1992), pp. 207-210. DOI: [10.1029/91GL02940](https://doi.org/10.1029%2F91GL02940). [4] R. Gelaro, W. McCarty, M. J. Suarez, et al. "The ModernEra Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)". In: _Journal of Climate_ 30 (14 2017), pp. 5419-5454. DOI: [10.1175/JCLI-D-16-0758.1](https://doi.org/10.1175%2FJCLI-D-16-0758.1). [5] P. L. McDermott and C. K. Wikle. "An ensemble quadratic echo state network for non-linear spatio-temporal forecasting". In: _Stat_ 6.1 (2017), pp. 315-330. [6] P. L. McDermott and C. K. Wikle. "Deep echo state networks with uncertainty quantification for spatio‐temporal forecasting". In: _Environmetrics_ 30.3 (2019). ISSN: 1180-4009. DOI: [10.1002/env.2553](https://doi.org/10.1002%2Fenv.2553). [7] H. Huang, S. Castruccio, and M. G. Genton. "Forecasting high-frequency spatio-temporal wind power with dimensionally reduced echo state networks". In: _Journal of the Royal Statistical Society Series C: Applied Statistics_ 71.2 (2022), pp. 449-466. [8] M. Bonas and S. Castruccio. "Calibration of SpatioTemporal forecasts from citizen science urban air pollution data with sparse recurrent neural networks". In: _The Annals of Applied Statistics_ 17.3 (2023), pp. 1820-1840. [9] A. Fisher, C. Rudin, and F. Dominici. "All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously". In: _Journal of Machine Learning Research_. 177 20 (2019), pp. 1-81. eprint: 1801.01489. URL: [http://jmlr.org/papers/v20/18-760.html](http://jmlr.org/papers/v20/18-760.html). [10] A. B. Arrieta, S. Gil-Lopez, I. Laña, et al. "On the post-hoc explainability of deep echo state networks for time series forecasting, image and video classification". In: _Neural Computing and Applications_ 34.13 (2022), pp. 10257-10277. ISSN: 0941-0643. DOI: [10.1007/s00521-021-06359-y](https://doi.org/10.1007%2Fs00521-021-06359-y). [11] A. Sood and M. Craven. "Feature Importance Explanations for Temporal Black-Box Models". In: _arXiv_ (2021). DOI: [10.48550/arxiv.2102.11934](https://doi.org/10.48550%2Farxiv.2102.11934). eprint: 2102.11934. [12] G. Hooker, L. Mentch, and S. Zhou. "Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance". In: _Statistics and Computing_ 31 (2021), pp. 1-16. [13] M. Landt-Hayen, P. Kröger, M. Claus, et al. "Layer-Wise Relevance Propagation for Echo State Networks Applied to Earth System Variability". In: _Signal, Image Processing and Embedded Systems Trends_. Ed. by D. C. Wyld. Computer Science & Information Technology (CS & IT): Conference Proceedings 20. ARRAY(0x55588c8d8680), 2022, pp. 115-130. ISBN: 978-1-925953-80-0. DOI: [doi:10.5121/csit.2022.122008](https://doi.org/doi%3A10.5121%2Fcsit.2022.122008). URL: [https://doi.org/10.5121/csit.2022.122008](https://doi.org/10.5121/csit.2022.122008). ] --- class: inverse, center, middle <br> # Thank you .white[kjgoode@sandia.gov] .sky-blue[goodekat.github.io] .white[Slides: [goodekat.github.io/presentations/2023-acasa/slides.html](https://goodekat.github.io/presentations/2023-acasa/slides.html)] --- class: inverse, center, middle # Back-Up Slides --- ## MERRA2 Data: Climatologies with variability <br> <img src="figs/merra_cltgs.png" width="40%" style="display: block; margin: auto;" /> --- ## Feature Importance: Spatio-Temporal Context -- **Compute FI on the trained ESN model** for... - spatio-temporal input variable `\(k\)` - over the block of times `\(\{t, t-1,..., t-b+1\}\)` - on the forecasts of the spatio-temporal response variable at time `\(t+\tau\)`. <br> -- <img src="figs/fi-demo.png" width="90%" style="display: block; margin: auto;" /> --- ## Feature Importance: Spatio-Temporal Context Contribution of spatial locations to ZFI: `\(\sqrt{\frac{w_{loc}(y_{t,loc}-\hat{y}_{t,loc,zeroed})^2}{\sum_{loc}w_{loc}}} - \sqrt{\frac{w_{loc}(y_{t,loc}-\hat{y}_{t,loc})^2}{\sum_{loc}w_{loc}}}\)` <img src="figs/fi-cont-aod.png" width="75%" style="display: block; margin: auto;" /> --- ## Feature Importance: Spatio-Temporal Context Contribution of spatial locations to ZFI: `\(\sqrt{\frac{w_{loc}(y_{t,loc}-\hat{y}_{t,loc,zeroed})^2}{\sum_{loc}w_{loc}}} - \sqrt{\frac{w_{loc}(y_{t,loc}-\hat{y}_{t,loc})^2}{\sum_{loc}w_{loc}}}\)` <img src="figs/fi-cont-temp.png" width="75%" style="display: block; margin: auto;" /> --- ## Simulated Data Demonstration .pull-left[ **Simulated response** `$$Z_{Y,t}({\bf s}_i)=Z_{2,t}({\bf s}_i) \beta + \delta_t({\bf s}_i) + \epsilon_t({\bf s}_i)$$` where - `\(Z_{2,t}\)` spatio-temporal covariate - `\(\delta_t({\bf s}_i)\)` spatio-temporal random effect - `\(\epsilon_t({\bf s}_i) \overset{iid}{\sim} N(0,\sigma_{\epsilon}^2)\)` <img src="figs/syn_data.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="figs/syn_data_heat.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Simulated Data: Effect of Variability on FI <img src="figs/zfi_nblock.png" width="90%" style="display: block; margin: auto;" /> --- ## Effect of Correlation on FI <img src="figs/effect-corr.png" width="100%" style="display: block; margin: auto;" />