Thesis subject

MSc Thesis topic: Garbage in garbage out? understanding the impact of training datasets on remote sensing based monitoring

Remote sensing based land monitoring frequently uses training and calibration datasets to predict different aspects of the earth's cover, e.g., forests, land use, land cover, and soil types.
Such predictions are essential for policymaking and policy implementation including the implementation of sustainable development goals.

Many scientific publications recommend a close look at the quality of training datasets that are used for remote sensing based monitoring. Despite its importance, the impact of training datasets on remote sensing based monitoring remains understudied. 

This issue is particularly important as today maps can be made with relative ease thanks to the advancements in large-area processing such as Google Earth Engine cloud computing. For example, the world's first global land cover map at 10m resolution was produced using Google Earth Engine (Gong et al. 2019) and many more are upcoming, including the WorldCover map by ESA (  Although progress has been made in improving remote sensing based predictions, some products disagree on a large scale.

Even when the same remote sensing imagery used and the same variables (e.g., land cover) are mapped, the predictions can be very different. Such differences can be associated with the quality of training datasets as the predictions are simply the best guess of the variable of interests given the input data. How much of the differences could be associated with training data? How do the landscape and data availability influence the differences in remote sensing predictions?

This research aims to address these questions. The research can be set up for some regions in Africa or Europe depending on data availability and interests.


This thesis is linked to the ESA funded project, in which WU is a partner:


  • Generating remote sensing based predictions of land use/land cover using a different set of training datasets
  • Assessing source of uncertainties in the predictions related to training data, data availability, and landscape


  • Gong, al. (2019). Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Science Bulletin, 64, 370-373
  • Pflugmacher, al. (2011). Comparison and assessment of coarse resolution land cover maps for Northern Eurasia. Remote Sensing of Environment, 115, 3539-3553
  • Zhang, X.,et al. (2019). Fine Land-Cover Mapping in China Using Landsat Datacube and an Operational SPECLib-Based Approach. Remote Sensing, 11, 1056
  • Millard, K., & Richardson, M. (2015). On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sensing, 7, 8489-8515
  • Corcoran, J.M., Knight, J.F., & Gallant, A.L. (2013). Influence of Multi-Source and Multi-Temporal Remotely Sensed and Ancillary Data on the Accuracy of Random Forest Classification of Wetlands in Northern Minnesota. Remote Sensing, 5, 3212-3238


  • Advanced remote sensing
  • Geoscripting

Theme(s): Integrated Land Monitoring