Improved Subseasonal Forecasting of Extreme Weather Using Statistical Machine Learning


Katherine Goode, Maike Holthuijzen, Jacob Johnson, Meredith G.L. Brown, and Thomas Ehrmann (PI)

Sandia National Laboratories
March 24, 2026




Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2026-18988C.

Motivation: Subseasonal Forecasting of Extreme Weather

As of 2024:

  • Extreme cold events caused >$120 billion in damages in the past 40 years [1]
  • Extreme heat events cause >$100 billion in damages annually [2]

Motivation: Subseasonal Forecasting of Extreme Weather

  • Traditional physics-based weather models too chaotic to predict extreme events beyond 15 days in advance [3]

  • This leaves an opportunity for improvement forecasting targeted extreme events over subseasonal (2-8 week) period [4]

Our Approach

Apply machine learning to determine if we can make improvements over physics-based models on extremes for subseasonal forecasts

Data

Data: Source and Variables

Weekly averaged data from MERRA-2 [5] ranging from 1980 through 2024

  • Response (averaged within 5 regions of the continental US (CONUS))
    • 2m air temperature
  • Predictors: separated into 9 global regions
    • Surface temperature
    • Sea-level pressure
    • Geopotential height at 850 hPa, 500 hPa, and 200 hPa
    • Air temperature at 850 hPa, 500 hPa, and 200 hPa

Data: Preprocessing

Response (2m air temperature): Compute average weekly time series within a CONUS region, remove the linear trend and harmonics, and obtain residuals

Data: Preprocessing

Predictor Variables: For each variable, compute the weekly mean time series and first 20 principal components

Modeling Approach

Forecast horizons: 1-4 weeks

Considered two statistical machine-learning approaches:

  • Tree-based model: random forest (RF)
  • Deep-learning model: ensemble echo-state network (EESN)

Baselines:

  • Persistence model
  • Linear model

Process:

  1. Parameter Tuning: data from 1980-2016 (training)
  2. Feature Selection: add data from 2017-2020 (validation)
  3. Performance Evaluation: test on 2021-2024 (testing)

Modeling Approach: Parameter Tuning

Select hyper-parameters that perform best with 5-fold rolling-origin forward-validation [6] on training data (1980-2016)

Modeling Approach: Feature Selection

(1) Grouped variables: Computed pairwise Pearson correlations and applied hierarchical clustering

(2) Ordered groups: Based on grouped permutation feature importance

Modeling Approach: Feature Selection

(3) Selected groups for final model: Iteratively retrained models with increasing number of groups and selected models with the smallest number of groups with a RMSE within 1% of the minimum observed RMSE (on data from 2017-2020)

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

  • Computed two ways: (1) over all test data and (2) only on “extremes”

  • Defined “extreme” temperatures as values with Z-scores above 1 or below -1

  • Z-scores computed using climatologies and standard deviations computed on the weekly averages

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Modeling Approach: Performance Evaluation

Final RMSEs computed on held out test-set: 2021-2024

Final Feature Importance Results: EESN

Final Feature Importance Results: Random Forest

Recent Work + Onward

Moving Forward

  • Comparison to physics models:
    • Working to compare predictions with European Centre for Medium-Range Weather (ECMWF) forecasts
  • Considering approaches for computing uncertainty on predictions:
    • Conformal prediction
  • Considering alternative inputs and data representations:
    • Compare different types of dimension reduction techniques (e.g., autoencoders)
    • Predictions on spatial grid
  • Adjustments for models adjusted to focus on extremes…

Extreme ESN?

Single-Layer Echo State Network

Output stage: ridge regression

\[\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t} \ \ \ \ \ \ {\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})\]


Hidden stage: nonlinear stochastic transformation

\[\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)\]

\[\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'\]

Only parameters estimated are in \(\textbf{V}\).

Elements of \(\textbf{W}\) and \(\textbf{U}\) randomly sampled…

\[\begin{align} \textbf{W}[h,c_w] &=\gamma_{h,c_w}^w\mbox{Unif}(-a_w,a_w)+(1-\gamma_{h,c_w}^w)\delta_0,\\ \textbf{U}[h,c_u] &=\gamma_{h,c_u}^u\mbox{Unif}(-a_u,a_u)+(1-\gamma_{h,c_u}^u)\delta_0, \end{align}\]

where

  • \(\gamma_{h,c_w}^w \sim Bern(\pi_w)\)
  • \(\gamma_{h,c_u}^u \sim Bern(\pi_u)\)
  • \(\delta_0\) is a Dirac function

and values of \(a_w\), \(a_u\), \(\pi_w\), and \(\pi_u\) are pre-specified and set to small values.

Thank you

Katherine Goode

kjgoode@sandia.gov

goodekat.gitub.io

References

  1. Extreme Cold, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-cold

  2. Extreme Heat, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-heat

  3. Matsueda, M. and Nakazawa, T. (2015), Early warning products for severe weather events derived from operational medium-range ensemble forecasts. Met. Apps, 22: 213-222. https://doi.org/10.1002/met.1444

  4. Cohen J, Coumou D, Hwang J, et al. S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonal forecasts. WIREs Clim Change. 2019; 10:e00567. https://doi.org/10.1002/wcc.567

  5. Saggioro, E., & Shepherd, T. G. (2019). Quantifying the Timescale and Strength of Southern Hemisphere Intraseasonal Stratosphere-troposphere Coupling. Geophysical Research Letters, 46, 13479–13487. https://doi.org/10.1029/2019GL084763.

  6. Schnaubelt, M. (2019). A comparison of machine learning model validation schemes for non-stationary time series data (FAU Discussion Papers in Economics No. 11/2019). Nurnberg. Retrieved from https://hdl.handle.net/10419/209136614.

Backup

EESN Prediction Uncertainty

EESN Prediction Uncertainty