Abstract Machine learning (ML) applications in hydrological forecasting are increasingly prevalent and show great potential. However, many previous studies have only evaluated performance through reanalysis or retrospective simulations compared to simplified baselines. This study provides the first assessment of ML performance against actual operational forecasting systems operated by the California Nevada River Forecast Center (CNRFC), which combines the Community Hydrologic Prediction System (CHPS) with forecasters‐in‐the‐loop. Results demonstrate that forecasters‐in‐the‐loop systems consistently outperform ML models in both general forecasts and flood alerting across lead times up to 96 hr, even when ML models use observed forcings, while CNRFC operational process relies on biased weather forecasts. Our analysis reveals that forecaster expertise maintains forecast reliability despite inaccurate precipitation inputs, with human‐guided systems showing superior performance degradation characteristics at extended lead times. These findings highlight the irreplaceable value of human expertise in operational forecasting and caution against overstating current ML capabilities in real‐world applications.

Read original article