Abstract Deep learning (DL)–based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models—the hybrid Neural General Circulation Model (NGCM) and purely data‐driven Deep Learning Earth System Model (DLESyM)—against a conventional high‐resolution land–atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900–2020, focusing on the out‐of‐sample period (1900–1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and coldwave events during 1900–1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940–1960. Due to excessive temperature autocorrelation, DLESyM tends to overestimate heatwave and coldwave frequencies, whereas the physics–DL hybrid NGCM exhibits persistence more similar to HiRAM.

Read original article