Forecasting chronic mastitis using automatic milking system sensor data and gradient-boosting classifiers

Bonestroo, John; Voort, Mariska van der; Hogeveen, Henk; Emanuelson, Ulf; Klaas, Ilka Christine; Fall, Nils


Although most of the losses due to mastitis per case in dairy production are estimated to be caused by clinical cases, subclinical cases, especially chronic, can also be problematic due to milk production losses and the risk of transmission of pathogens. Knowing which subclinical mastitis cases will become chronic at an early stage would be helpful in intervening in these cases. Automatic milking systems (AMS) can collect data on mastitis indicators such as conductivity, Somatic cell count (SCC), and blood in the milk for each milking. The aim of this study was to develop a sensor-based prediction model using SCC, conductivity, blood in the milk, parity, milk diversion, time interval between milkings, milk yield and DIM that forecasts the chronicity in subclinical mastitis cases after an initial increase in SCC. We used sensor data from 14 European and North American dairy farms (with herd sizes of lactating cows ranging from 55 to 638 cows and herd mean parities between 2.00 and 3.19) with an AMS and an online cell counter, measuring SCC. Typically, a threshold of 200,000 SCC/ml has been used to distinguish cows with subclinical mastitis from healthy cows. We used gradient-boosting trees and sensor data to forecast whether the SCC would decrease structurally below 200,000 SCC/ml in 50 days after the day at which the prediction was performed. Data from 30 and 15 days prior to the day where the forecast was made, was used. The model was trained on data from seven randomly selected dairy farms from the dataset and the data of the remaining seven dairy farms were used to estimate the predictive performance. These results were compared with two approaches that simulate how farmers would diagnose chronic mastitis with a simple prediction rule based on close-to-daily SCC (frequent sampling approach), and on less frequent monthly SCC (monthly sampling approach). We used accuracy, Matthew's correlation coefficient (MCC), and Area under the Curve (AUC) as metrics to assess the forecasting performance of the chronic mastitis prediction model. On average, the forecast model, using 30 days of sensor data prior to the day of prediction, outperformed the approaches according to the accuracy (chronic mastitis prediction model: 0.888, frequent sampling approach: 0.848, and monthly sampling approach: 0.865), MCC (chronic mastitis prediction model: 0.712, frequent sampling approach: 0.630, and monthly sampling approach: 0.552), and AUC metrics (chronic mastitis prediction model: 0.964 and frequent sampling approach: 0.941) metrics. The results also indicate that shortening the input requirement from 30 days of prior sensor data to 15 days has a limited effect on the performance of the model. Overall, this study shows that it is possible with a high accuracy to predict the future chronic mastitis status using past sensor data and machine learning models.