Assessment of workflow feature selection on forest LAI prediction with sentinel-2A MSI, landsat 7 ETM+ and Landsat 8 OLI

Brede, Benjamin; Verrelst, Jochem; Gastellu-Etchegorry, Jean Philippe; Clevers, Jan G.P.W.; Goudzwaard, Leo; Ouden, Jan den; Verbesselt, Jan; Herold, Martin


The European Space Agency (ESA)'s Sentinel-2A (S2A) mission is providing time series that allow the characterisation of dynamic vegetation, especially when combined with the National Aeronautics and Space Administration (NASA)/United States Geological Survey (USGS) Landsat 7 (L7) and Landsat 8 (L8) missions. Hybrid retrieval workflows combining non-parametric Machine Learning Regression Algorithms (MLRAs) and vegetation Radiative Transfer Models (RTMs) were proposed as fast and accurate methods to infer biophysical parameters such as Leaf Area Index (LAI) from these data streams. However, the exact design of optimal retrieval workflows is rarely discussed. In this study, the impact of five retrieval workflow features on LAI prediction performance of MultiSpectral Instrument (MSI), Enhanced Thematic Mapper Plus (ETM+) and Operational Land Imager (OLI) observations was analysed over a Dutch beech forest site for a one-year period. The retrieval workflow features were the (1) addition of prior knowledge of leaf chemistry (two alternatives), (2) the choice of RTM (two alternatives), (3) the addition of Gaussian noise to RTM produced training data (four and five alternatives), (4) possibility of using Sun Zenith Angle (SZA) as an additional MLRA training feature (two alternatives), and (5) the choice of MLRA (six alternatives). The featureswere varied in a full grid resulting in 960 inversionmodels in order to find the overall impact on performance as well as possible interactions among the features. A combination of a Terrestrial Laser Scanning (TLS) time series with litter-trap derived LAI served as independent validation. The addition of absolute noise had the most significant impact on prediction performance. It improved the median prediction RootMean Square Error (RMSE) by 1.08m2m-2 when 5% noise was added compared to inversions with 0% absolute noise. The choice of the MLRA was second most important in terms of median prediction performance, which differed by 0.52m2m-2 between the best and worst model. The best inversion model achieved an RMSE of 0.91m2m-2 and explained 84.9% of the variance of the reference time series. The results underline the need to explicitly describe the used noise model in future studies. Similar studies should be conducted in other study areas, both forest and crop systems, in order to test the noise model as an integral part of hybrid retrieval workflows.