Abstract This study provides a systematic evaluation of historical simulations of monthly surface air temperature and precipitation from CMIP6 and CMIP5 models. By utilizing an error decomposition framework that separates the Mean Squared Error (MSE) into mean bias, variance, and correlation components, we quantify the specific sources of error in each variable. The results demonstrate that CMIP6 models exhibit an apparent improvement in temperature simulations, primarily driven by a reduction in mean bias, whereas precipitation shows marginal improvements. A long‐term error analysis reveals that while absolute errors have decreased in recent decades, the normalized errors have increased due to the decrease in observational variability. This suggests that models may struggle to capture the reducing magnitude of natural variability. This study highlights that while representation of the mean state of surface temperature has improved, structural discrepancies in precipitation patterns remain a critical challenge for model development.