Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2026-18988C.
As of 2024:
Traditional physics-based weather models too chaotic to predict extreme events beyond 15 days in advance [3]
This leaves an opportunity for improvement forecasting targeted extreme events over subseasonal (2-8 week) period [4]
Apply machine learning to determine if we can make improvements over physics-based models on extremes for subseasonal forecasts
Weekly averaged data from MERRA-2 [5] ranging from 1980 through 2024
Response (2m air temperature): Compute average weekly time series within a CONUS region, remove the linear trend and harmonics, and obtain residuals
Predictor Variables: For each variable, compute the weekly mean time series and first 20 principal components
Forecast horizons: 1-4 weeks
Considered two statistical machine-learning approaches:
Baselines:
Process:
Select hyper-parameters that perform best with 5-fold rolling-origin forward-validation [6] on training data (1980-2016)
(1) Grouped variables: Computed pairwise Pearson correlations and applied hierarchical clustering
(2) Ordered groups: Based on grouped permutation feature importance
(3) Selected groups for final model: Iteratively retrained models with increasing number of groups and selected models with the smallest number of groups with a RMSE within 1% of the minimum observed RMSE (on data from 2017-2020)
Final RMSEs computed on held out test-set: 2021-2024
Computed two ways: (1) over all test data and (2) only on “extremes”
Defined “extreme” temperatures as values with Z-scores above 1 or below -1
Z-scores computed using climatologies and standard deviations computed on the weekly averages
Final RMSEs computed on held out test-set: 2021-2024
Final RMSEs computed on held out test-set: 2021-2024
Single-Layer Echo State Network
Output stage: ridge regression
\[\textbf{y}_{t} = \mathbf{V} \mathbf{h}_t + \boldsymbol{\epsilon}_{t} \ \ \ \ \ \ {\bf \epsilon_t } \sim N(\textbf{0}, \sigma^2_\epsilon \textbf{I})\]
Hidden stage: nonlinear stochastic transformation
\[\mathbf{h}_t = g_h \left(\frac{\nu}{|\lambda_w|} \mathbf{W} \mathbf{h}_{t-1} + \mathbf{U} \mathbf{\tilde{x}}_{t-\tau}\right)\]
\[\tilde{\mathbf{x}}_{t-\tau}=\left[\textbf{x}'_{t-\tau},\textbf{x}'_{t-\tau-\tau^*},...,\mathbf{x}'_{t-\tau-m\tau^*}\right]'\]
Only parameters estimated are in \(\textbf{V}\).
Elements of \(\textbf{W}\) and \(\textbf{U}\) randomly sampled…
\[\begin{align} \textbf{W}[h,c_w] &=\gamma_{h,c_w}^w\mbox{Unif}(-a_w,a_w)+(1-\gamma_{h,c_w}^w)\delta_0,\\ \textbf{U}[h,c_u] &=\gamma_{h,c_u}^u\mbox{Unif}(-a_u,a_u)+(1-\gamma_{h,c_u}^u)\delta_0, \end{align}\]
where
and values of \(a_w\), \(a_u\), \(\pi_w\), and \(\pi_u\) are pre-specified and set to small values.
Katherine Goode
kjgoode@sandia.gov
Extreme Cold, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-cold
Extreme Heat, (referenced in 2024), https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/extreme-weather-and-climate-change/extreme-heat
Matsueda, M. and Nakazawa, T. (2015), Early warning products for severe weather events derived from operational medium-range ensemble forecasts. Met. Apps, 22: 213-222. https://doi.org/10.1002/met.1444
Cohen J, Coumou D, Hwang J, et al. S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonal forecasts. WIREs Clim Change. 2019; 10:e00567. https://doi.org/10.1002/wcc.567
Saggioro, E., & Shepherd, T. G. (2019). Quantifying the Timescale and Strength of Southern Hemisphere Intraseasonal Stratosphere-troposphere Coupling. Geophysical Research Letters, 46, 13479–13487. https://doi.org/10.1029/2019GL084763.
Schnaubelt, M. (2019). A comparison of machine learning model validation schemes for non-stationary time series data (FAU Discussion Papers in Economics No. 11/2019). Nurnberg. Retrieved from https://hdl.handle.net/10419/209136614.