Publications

Chemometric pre-processing can negatively affect the performance of near-infrared spectroscopy models for fruit quality prediction

Mishra, Puneet; Rutledge, Douglas N.; Roger, Jean Michel; Wali, Khan; Khan, Haris Ahmad

Summary

Chemometrics pre-processing of spectral data is widely performed to enhance the predictive performance of near-infrared (NIR) models related to fresh fruit quality. Pre-processing approaches in the domain of NIR data analysis are used to remove the scattering effects, thus, enhancing the absorption components related to the chemical properties. However, in the case of fresh fruit, both the scattering and absorption properties are of key interest as they jointly explain the physicochemical state of a fruit. Therefore, pre-processing data that reduces the scattering information in the spectra may lead to poorly performing models. The objectives of this study are to test two hypotheses to explore the effect of pre-processing on NIR spectra of fresh fruit. The first hypothesis is that the pre-processing of NIR spectra with scatter correction techniques can reduce the predictive performance of models as the scatter correction can reduce the useful scattering information correlated to the property of interest. The second hypothesis is that the Deep Learning (DL) can model the raw absorbance data (mix of scattering and absorption) much more efficiently than the Partial Least Squares (PLS) regression analysis. To test the hypotheses, a real NIR data set related to dry matter (DM) prediction in mango fruit was used. The dataset consisted of a total of 11,420 NIR spectra and reference DM measurements for model training and independent testing. The chemometric pre-processing methods explored were standard normal variate (SNV), variable sorting for normalization (VSN), Savitzky-Golay based 2nd derivative and their combinations. Further two modelling approaches i.e., PLS regression and DL were used to evaluate the effect of pre-processing. The results showed that the best root mean squared error of prediction (RMSEP) for both the PLS and DL models were obtained with the raw absorbance data. The spectral pre-processing in general decreased the performance of both the PLS and DL models. Further, the DL model attained the lowest RMSEP of 0.76%, which was 13% lower compared to the PLS regression on the raw absorbance data. Pre-processing approaches should be carefully used while analysing the NIR data related to fresh fruit.