
Student information
MSc thesis topic: Random Forest spatial interpolation
Machine learning techniques are used increasingly for spatial interpolation. This is because the number of covariates that help explain the spatial variation in target variables has grown dramatically. However, unlike geostatistics, machine learning techniques do not take spatial autocorrelation explicitly into account. Recently, we developed a method called Random Forest Spatial Interpolation (RFSI), which extends conventional Random Forest machine learning by adding the nearest observations and distances to nearest observations as extra covariates. The aim of this MSc research is to test this approach and finetune its hyperparameters to a digital soil mapping case study.
The thesis research will begin with a literature review of machine learning for soil mapping. Next you will study the RFSI algorithm and apply it to a test dataset. This is all done in R, using existing scripts that need slight modification. Once this is completed you will test the method for a real-world digital soil mapping case study. This involves the selection of a study area and preparing covariates and soil point observations, provided by ISRIC - World Soil Information. Application to the case study must also include a cross-validation of results, comparison with conventional random forest, and optimisation of hyperparameters, such as the number of nearest observations that should be included as covariates. The results of this MSc research will likely be very useful to the ISRIC SoilGrids project (www.soilgrids.org).
Objectives
- Learn about and master machine-learning techniques, in particular random forests
- Learn about and use R software to apply RFSI to a test case
- Refine the RFSI methodology and apply it to a real-world digital soil mapping case study, in which RFSI is compared to conventional random forest and cross-validation is used to evaluate if prediction accuracy improved
Literature
- Machine learning
- Random Forest Spatial Interpolation manuscript (currently under review)
- Digital soil mapping, e.g. SoilGrids publications
Requirements
- Solid background in statistical modelling, such as obtained through the Spatial Modelling and Statistics course
- Experience with programming in R
Theme(s): Modelling & visualisation