Thesis subject

MSc Thesis topic: Garbage in garbage out? understanding the impact of training datasets on remote sensing based monitoring

Remote sensing based land monitoring frequently uses training and calibration datasets to predict different aspects of the earth's cover, e.g., forests, land use, land cover, and soil types.
Such predictions are essential for policymaking and policy implementation including the implementation of sustainable development goals.

Many scientific publications recommend a close look at the quality of training datasets that are used for remote sensing based monitoring. Despite its importance, the impact of training datasets on remote sensing based monitoring remains understudied.

This issue is particularly important as today maps can be made with relative ease thanks to the advancements in large-area processing such as Google Earth Engine cloud computing. For example, the world's first global land cover map at 10m resolution was produced using Google Earth Engine (Gong et al. 2019) and the WorldCover map by ESA (https://esa-worldcover.org). Although progress has been made in improving remote sensing based predictions, some products disagree on a large scale.

Even when the same remote sensing imagery is used and the same variables (e.g., land cover) are mapped, the predictions can be very different. Such differences can be associated with the quality of training datasets as the predictions are simply the best guess of the variable of interest given the input data. How much of the differences could be associated with training data? How does the landscape and data availability influence the differences in remote sensing predictions?

This research aims to address these questions. The research can be set up for some regions in Europe depending on data availability and interests.

Objectives

Simulate land cover/use mapping to study the effect of training data error, sampling and size.
Generating remote sensing based predictions of land use/land cover using a different set of training datasets
Assessing source of uncertainties in the predictions related to training data, data availability, and landscape

Literature

Gong, P.et al. (2019). Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Science Bulletin, 64, 370-373
Pflugmacher, D.et al. (2011). Comparison and assessment of coarse resolution land cover maps for Northern Eurasia. Remote Sensing of Environment, 115, 3539-3553
Zhang, X.,et al. (2019). Fine Land-Cover Mapping in China Using Landsat Datacube and an Operational SPECLib-Based Approach. Remote Sensing, 11, 1056
Millard, K., & Richardson, M. (2015). On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sensing, 7, 8489-8515
Corcoran, J.M., Knight, J.F., & Gallant, A.L. (2013). Influence of Multi-Source and Multi-Temporal Remotely Sensed and Ancillary Data on the Accuracy of Random Forest Classification of Wetlands in Northern Minnesota. Remote Sensing, 5, 3212-3238
https://www.mdpi.com/2072-4292/12/6/1034/htm

Expected reading list before starting the thesis research

The same as above

Requirements

Advanced remote sensing
Geoscripting

Theme(s): Integrated Land Monitoring