Studentinformatie

MSc thesis subject: Modelling uncertainty in Machine Learning predictions

Machine learning techniques are used increasingly for spatial interpolation. This is because the number of covariates that help explain the spatial variation in target variables has grown dramatically. However, unlike geostatistics, it is more difficult to quantify the interpolation error with machine learning algorithms. The aim of this MSc research is to look into this specific problem and apply obtained solutions to a digital soil mapping case study.

The thesis research will begin with a literature review of machine learning and uncertainty quantification (it seems that quantile random forests may be a very suitable approach but the literature search will be more comprehensive). Next the student will have to understand the ins and outs of the various techniques and learn how the methods are implemented in software tools (R packages). After some trial runs with example datasets and a selection of the most promising technique it will be applied to a digital soil mapping case. This involves the selection of a study area and preparing covariates and soil point observations. These are all available from the WorldGrids and WoSIS databases managed by ISRIC - World Soil Information in Wageningen. Application to the case study must also include a cross-validation of results. Validation of interpolation uncertainty assessment is done through exceedance probability plots and related measures. The results of this MSc research will likely be very useful to the ISRIC SoilGrids project (www.soilgrids.org).

Objectives

  • Learn about and master machine-learning techniques such as random forests, gradient boosting and artificial neural networks
  • Learn about and master methods to quantify prediction uncertainty using machine learning techniques
  • Learn to use software implementations of machine learning and machine learning uncertainty quantification techniques
  • Test the validity of a selected machine learning uncertainty quantification technique by application to a real-world digital soil mapping case study

Literature

  • Machine learning
  • Machine learning uncertainty quantification, e.g. quantile random forests
  • Digital soil mapping, e.g. SoilGrids publications

Requirements

  • Solid background in statistical modelling, such as obtained through the Spatial Modelling and Statistics course
  • Experience with programming in R

Theme(s): Modelling & visualisation