Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses

Kamphuis, Claudia; Duenk, Pascal; Veerkamp, Roel Franciscus; Visser, Bram; Singh, Gurnoor; Nigsch, Annette; Mol, Rudi Maria De; Broekhuijse, Marleen Leonarda Wilhelmina Johanna


Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.