Simulation-assisted machine learning for operational digital twins

Pylianidis, Christos; Snow, Val; Overweg, Hiske; Osinga, Sjoukje; Kean, John; Athanasiadis, Ioannis N.


In the environmental sciences, there are ongoing efforts to combine multiple models to assist the analysis of complex systems. Combining process-based models, which have encoded domain knowledge, with machine learning models, which can flexibly adapt to input data, can improve modeling capabilities. However, both types of models have input data limitations. We propose a methodology to overcome these issues by using a process-based model to generate data, aggregating them to a lower resolution to mimic real situations, and developing machine learning models using a fraction of the process-based model inputs. We showcase this method with a case study of pasture nitrogen response rate prediction. We train models of different scales and test them in sampled and unsampled location experiments to assess their practicality in terms of accuracy and generalization. The resulting models provide accurate predictions and generalize well, showing the usefulness of the proposed method for tactical decision support.