Improved Subseasonal Forecasting of Extreme Weather Using Statistical Machine Learning


Katherine Goode1, Maike Holthuijzen1, Jacob Johnson1,2, Meredith G.L. Brown1, and Thomas Ehrmann1 (PI)

1Sandia National Laboratories
2University of Wisconsin-Madison
March 24, 2026




Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2026-18988C.

Motivation: Subseasonal Forecasting of Extreme Weather

As of 2024:

  • Extreme cold events caused >$120 billion in damages in the past 40 years [1]
  • Extreme heat events cause >$100 billion in damages annually [2]

Motivation: Subseasonal Forecasting of Extreme Weather

  • Traditional physics-based weather models too chaotic to predict extreme events beyond 15 days in advance [3]

  • This leaves an opportunity for improvement forecasting targeted extreme events over subseasonal (2-8 week) period [4]

Our Approach

Apply machine learning to determine if we can make improvements over physics-based models on extremes for subseasonal forecasts

Data

Data: Source and Variables

Weekly averaged data from MERRA-2 [5] ranging from 1980 through 2024 [6]

  • Response (averaged within 5 regions of the continental US (CONUS))
    • 2m air temperature
  • Predictors: separated into 9 global regions
    • Surface temperature
    • Sea-level pressure
    • Geopotential height at 850 hPa, 500 hPa, and 200 hPa
    • Air temperature at 850 hPa, 500 hPa, and 200 hPa

Data: Preprocessing

Response (2m air temperature): Compute average weekly time series within a CONUS region, remove the linear trend and harmonics, and obtain residuals

Data: Preprocessing

Predictor Variables: For each variable, compute the weekly mean time series and first 20 principal components

Modeling Approach

Forecast horizons: 1-4 weeks

Considered two statistical machine-learning approaches:

  • Tree-based model: random forest (RF)
  • Deep-learning model: ensemble echo-state network (EESN) [7,8,9]

Baselines:

  • Persistence model
  • Linear model

Process:

  1. Parameter Tuning: data from 1980-2016 (training)
  2. Feature Selection: add data from 2017-2020 (validation)
  3. Performance Evaluation: test on 2021-2024 (testing)

Modeling Approach: Parameter Tuning

Select hyper-parameters that perform best with 5-fold rolling-origin forward-validation [10] on training data (1980-2016)

Modeling Approach: Feature Selection

(1) Grouped variables: Computed pairwise Pearson correlations and applied hierarchical clustering

(2) Ordered groups: Based on grouped permutation feature importance

Modeling Approach: Feature Selection

(3) Selected groups for final model: Iteratively retrained models with increasing number of groups and selected models with the smallest number of groups with a RMSE within 1% of the minimum observed RMSE (on data from 2017-2020)

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

  • Computed two ways: (1) over all test data and (2) only on “extremes”

  • Defined “extreme” temperatures as values with Z-scores above 1 or below -1

  • Z-scores computed using climatologies and standard deviations computed on the weekly averages

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Final Feature Importance Results: EESN

Final Feature Importance Results: Random Forest

Recent Work + Onward

Moving Forward

  • Comparison to physics models:
    • Working to compare predictions with European Centre for Medium-Range Weather (ECMWF) forecasts
  • Considering approaches for computing uncertainty on predictions:
    • Conformal prediction
  • Considering alternative inputs and data representations:
    • Compare different types of dimension reduction techniques (e.g., autoencoders)
    • Predictions on spatial grid
  • Adjustments for models adjusted to focus on extremes…

Extreme ESN?

Single-Layer Echo State Network

Output stage: ridge regression

\[\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t} \ \ \ \ \ \ {\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})\]


Hidden stage: nonlinear stochastic transformation

\[\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)\]

\[\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'\]

Only parameters estimated are in \(\textbf{V}\).

Elements of \(\textbf{W}\) and \(\textbf{U}\) randomly sampled…

\[\begin{align} \textbf{W}[h,c_w] &=\gamma_{h,c_w}^w\mbox{Unif}(-a_w,a_w)+(1-\gamma_{h,c_w}^w)\delta_0,\\ \textbf{U}[h,c_u] &=\gamma_{h,c_u}^u\mbox{Unif}(-a_u,a_u)+(1-\gamma_{h,c_u}^u)\delta_0, \end{align}\]

where

  • \(\gamma_{h,c_w}^w \sim Bern(\pi_w)\)
  • \(\gamma_{h,c_u}^u \sim Bern(\pi_u)\)
  • \(\delta_0\) is a Dirac function

and values of \(a_w\), \(a_u\), \(\pi_w\), and \(\pi_u\) are pre-specified and set to small values.

Thank you

Katherine Goode

kjgoode@sandia.gov

goodekat.gitub.io

References

  1. Extreme Cold, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-cold

  2. Extreme Heat, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-heat

  3. Matsueda, M. and Nakazawa, T. (2015), Early warning products for severe weather events derived from operational medium-range ensemble forecasts. Met. Apps, 22: 213-222. https://doi.org/10.1002/met.1444.

  4. Cohen J, Coumou D, Hwang J, et al. S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonal forecasts. WIREs Clim Change. 2019; 10:e00567. https://doi.org/10.1002/wcc.567.

  5. Saggioro, E., & Shepherd, T. G. (2019). Quantifying the Timescale and Strength of Southern Hemisphere Intraseasonal Stratosphere-troposphere Coupling. Geophysical Research Letters, 46, 13479–13487. https://doi.org/10.1029/2019GL084763.

  6. https://education.nationalgeographic.org/resource/united-states-regions/

  7. McDermott PL, Wikle CK. Deep echo state networks with uncertainty quantification for spatio-temporal forecasting. Environmetrics. 2019;30:e2553. https://doi.org/10.1002/env.2553.

  8. K.Goode, D.Ries, and K.McClernon, Characterizing climate pathways using feature importance on echo state networks, Stat. Anal. Data Min.: ASA Data Sci. J.17 (2024), e11706. https://doi.org/10.1002/sam.11706.

  9. Ries, D., Goode, K., McClernon, K., & Hillman, B. (2025). Using feature importance as an exploratory data analysis tool on Earth system models. Geoscientific Model Development, 18(4), 1041-1065. https://gmd.copernicus.org/articles/18/1041/2025/.

  10. Schnaubelt, M. (2019). A comparison of machine learning model validation schemes for non-stationary time series data (FAU Discussion Papers in Economics No. 11/2019). Nurnberg. Retrieved from https://hdl.handle.net/10419/209136614.

Backup

EESN Prediction Uncertainty

EESN Prediction Uncertainty