Abstract Machine learning (ML) weather models like GraphCast and NeuralGCM show forecasting promise but face fundamental limitations for data assimilation (DA) integration. This study reveals critical problems in error covariance representation and adjoint sensitivity patterns challenging their operational viability. We evaluate tangent linear and adjoint models of GraphCast and NeuralGCM by comparing perturbation responses with MPASâA, a wellâestablished numerical weather prediction model. ML models exhibit unphysical adjoint sensitivities, including persistent localized responses and excessive noise at various atmospheric levels, contrasting sharply with physically consistent MPASâA patterns and indicating fundamental error covariance representation issues. Implications extend across DA methodologies. Unrealistic sensitivity patterns would generate distorted error covariances in ensemble systems and unphysical analysis increments in variational approaches like 4DVar. Assimilating single observations could create spurious corrections far from observation locations, degrading forecast skill. Despite forecasting capabilities, current ML weather models require significant improvements in linearization properties before reliable operational DA integration.