
Colloquium
Leveraging data science and machine learning to analyse and predict main crop rotation patterns
By Wessel Eshuis
Abstract
Crop rotations play a fundamental role in effective agricultural management. This thesis is a deep dive into seven-years crop rotation patterns across Dutch agricultural fields, utilizing data from the national field parcel dataset. The thesis is split into two distinct approaches. The first objective is to recognize, characterize and spatially map crop rotation patterns at both a national and regional scale. The second objective utilizes this knowledge as a foundation for evaluating the performance of a newly developed transformer-encoder model, which has been trained to predict future crops using the national field parcel dataset. The model is intended to form the basis of a future AI-driven decision support system, enabling the simulation of the impact that policy change could have on agricultural practices, such as the influence of nitrogen regulations on the use of cover crops in farming systems.
18 major distinct crop rotation patterns have been revealed using a technique called hierarchical clustering. This clustering method was done using a Hamming distance matrix as input, which quantified similarity between all sequences, allowing the clustering process to reveal distinct patterns in crop rotation practices. Through spatial analysis, the identified major crop rotation clusters were examined in relation to key variables, including soil texture, crop diversity and the use of cover crops, providing insights into the underlying factors influencing crop rotation patterns as well as the effect of these patterns on the agricultural landscape.
Upon validating the crop rotation patterns, a transformer-encoder model was developed. This deep learning approach has been trained to learn and recognize crop rotation patterns, enabling the prediction of likely crop choices for the following season based solely on previously cultivated crops. The model achieves a top-3 prediction accuracy exceeding 80%. Its performance was evaluated against three, non-machine learning predicting approaches, by using the Kullback-Leibler divergence to measure the divergence from the reference distribution and predicted distributions. The model outperformed two of these approaches and demonstrated the potential to surpass the third. The results show that the model can predict across varying sequence lengths and rare crop sequences, highlighting its robustness compared to the alternative predictive methods.
Keywords: Crop rotation; Hamming distance; Hierarchical clustering; KL-divergence Transformer-encoder; AI-driven decision support system