Measurement error-filtered machine learning in digital soil mapping

Westhuizen, Stephan van der; Heuvelink, Gerard B.M.; Hofmeyr, David P.; Poggio, Laura


This paper presents a two-stage maximum likelihood framework to deal with measurement errors in digital soil mapping (DSM) when using a machine learning (ML) model. The framework is implemented with random forest and projection pursuit regression to illustrate two different areas of machine learning, i.e. ensemble learning with trees and feature-learning. In our proposed framework, a measurement error variance (MEV) is incorporated as a weight in the log-likelihood function so that measurements with a larger MEV receive less weight when a ML model is calibrated. We evaluate the performance of the error-filtered ML models with an error-filtered regression kriging model, in a comprehensive simulation study and in a real-world case study of Namibian data. From the results we show that prediction accuracy can be increased by using our proposed framework, especially when the MEVs are large and heterogeneous.