Abstract Equation discovery methods, such as symbolic regression, show great promise to generate parameterizations of biogeochemical processes in an objective data‐driven manner, yet remain untested in ocean biogeochemistry. Here, we apply symbolic regression to a state‐of‐the‐art ocean biogeochemical model, using it as a surrogate data set to rediscover an empirical equation used to calculate colloidal iron in the model. We introduce a robustness metric combining R2 (global pattern reproduction) and EMD‐SHAP (similarity of functional behaviors) for discovered equations. While symbolic regression did not rediscover the original equation because of its empirical complexity, it generated simpler equations with similar performance and functional behaviors, indicating symbolic regression’s potential as an emulator bridging between models. Subsampling experiments show that robust equations require full‐depth and multi‐basin sampling, underscoring sampling priorities on colloidal iron. This framework can be broadly applicable to other poorly constrained biogeochemical processes.

Read original article