Improved Subseasonal Forecasting of Extreme Weather Using Statistical Machine Learning

Katherine Goode¹, Maike Holthuijzen¹, Jacob Johnson^1,2, Meredith G.L. Brown¹, and Thomas Ehrmann¹ (PI)

¹Sandia National Laboratories
²University of Wisconsin-Madison
March 24, 2026

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2026-18988C.

Motivation: Subseasonal Forecasting of Extreme Weather

As of 2024:

Extreme cold events caused >$120 billion in damages in the past 40 years [1]
Extreme heat events cause >$100 billion in damages annually [2]

Motivation: Subseasonal Forecasting of Extreme Weather

Traditional physics-based weather models too chaotic to predict extreme events beyond 15 days in advance [3]
This leaves an opportunity for improvement forecasting targeted extreme events over subseasonal (2-8 week) period [4]

Our Approach

Apply machine learning to determine if we can make improvements over physics-based models on extremes for subseasonal forecasts

Data

Data: Source and Variables

Weekly averaged data from MERRA-2 [5] ranging from 1980 through 2024 [6]

Response (averaged within 5 regions of the continental US (CONUS))
- 2m air temperature
Predictors: separated into 9 global regions
- Surface temperature
- Sea-level pressure
- Geopotential height at 850 hPa, 500 hPa, and 200 hPa
- Air temperature at 850 hPa, 500 hPa, and 200 hPa

Data: Preprocessing

Response (2m air temperature): Compute average weekly time series within a CONUS region, remove the linear trend and harmonics, and obtain residuals

Data: Preprocessing

Predictor Variables: For each variable, compute the weekly mean time series and first 20 principal components

Modeling Approach

Forecast horizons: 1-4 weeks

Considered two statistical machine-learning approaches:

Tree-based model: random forest (RF)
Deep-learning model: ensemble echo-state network (EESN) [7,8,9]

Baselines:

Persistence model
Linear model

Process:

Parameter Tuning: data from 1980-2016 (training)
Feature Selection: add data from 2017-2020 (validation)
Performance Evaluation: test on 2021-2024 (testing)

Modeling Approach: Parameter Tuning

Select hyper-parameters that perform best with 5-fold rolling-origin forward-validation [10] on training data (1980-2016)

Modeling Approach: Feature Selection

(1) Grouped variables: Computed pairwise Pearson correlations and applied hierarchical clustering

(2) Ordered groups: Based on grouped permutation feature importance

Modeling Approach: Feature Selection

(3) Selected groups for final model: Iteratively retrained models with increasing number of groups and selected models with the smallest number of groups with a RMSE within 1% of the minimum observed RMSE (on data from 2017-2020)

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Computed two ways: (1) over all test data and (2) only on “extremes”
Defined “extreme” temperatures as values with Z-scores above 1 or below -1
Z-scores computed using climatologies and standard deviations computed on the weekly averages

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Final Feature Importance Results: EESN

Final Feature Importance Results: Random Forest

Recent Work + Onward

Moving Forward

Comparison to physics models:
- Working to compare predictions with European Centre for Medium-Range Weather (ECMWF) forecasts

Considering approaches for computing uncertainty on predictions:
- Conformal prediction

Considering alternative inputs and data representations:
- Compare different types of dimension reduction techniques (e.g., autoencoders)
- Predictions on spatial grid

Adjustments for models adjusted to focus on extremes…

Extreme ESN?

Single-Layer Echo State Network

Output stage: ridge regression

\[\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t} \ \ \ \ \ \ {\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})\]

Hidden stage: nonlinear stochastic transformation

\[\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)\]

\[\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'\]

Only parameters estimated are in $\textbf{V}$.

Elements of $\textbf{W}$ and $\textbf{U}$ randomly sampled…

\[\begin{align} \textbf{W}[h,c_w] &=\gamma_{h,c_w}^w\mbox{Unif}(-a_w,a_w)+(1-\gamma_{h,c_w}^w)\delta_0,\\ \textbf{U}[h,c_u] &=\gamma_{h,c_u}^u\mbox{Unif}(-a_u,a_u)+(1-\gamma_{h,c_u}^u)\delta_0, \end{align}\]

where

$\gamma_{h,c_w}^w \sim Bern(\pi_w)$
$\gamma_{h,c_u}^u \sim Bern(\pi_u)$
$\delta_0$ is a Dirac function

and values of $a_w$, $a_u$, $\pi_w$, and $\pi_u$ are pre-specified and set to small values.

Thank you

Katherine Goode

kjgoode@sandia.gov

goodekat.gitub.io

References

Extreme Cold, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-cold
Extreme Heat, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-heat
Matsueda, M. and Nakazawa, T. (2015), Early warning products for severe weather events derived from operational medium-range ensemble forecasts. Met. Apps, 22: 213-222. https://doi.org/10.1002/met.1444.
Cohen J, Coumou D, Hwang J, et al. S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonal forecasts. WIREs Clim Change. 2019; 10:e00567. https://doi.org/10.1002/wcc.567.
Saggioro, E., & Shepherd, T. G. (2019). Quantifying the Timescale and Strength of Southern Hemisphere Intraseasonal Stratosphere-troposphere Coupling. Geophysical Research Letters, 46, 13479–13487. https://doi.org/10.1029/2019GL084763.
https://education.nationalgeographic.org/resource/united-states-regions/
McDermott PL, Wikle CK. Deep echo state networks with uncertainty quantification for spatio-temporal forecasting. Environmetrics. 2019;30:e2553. https://doi.org/10.1002/env.2553.
K.Goode, D.Ries, and K.McClernon, Characterizing climate pathways using feature importance on echo state networks, Stat. Anal. Data Min.: ASA Data Sci. J.17 (2024), e11706. https://doi.org/10.1002/sam.11706.
Ries, D., Goode, K., McClernon, K., & Hillman, B. (2025). Using feature importance as an exploratory data analysis tool on Earth system models. Geoscientific Model Development, 18(4), 1041-1065. https://gmd.copernicus.org/articles/18/1041/2025/.
Schnaubelt, M. (2019). A comparison of machine learning model validation schemes for non-stationary time series data (FAU Discussion Papers in Economics No. 11/2019). Nurnberg. Retrieved from https://hdl.handle.net/10419/209136614.

Backup

EESN Prediction Uncertainty