DNA sequence and shape are predictive for meiotic crossovers throughout the plant kingdom

Demirci, Sevgin; Peters, Sander A.; Ridder, Dick de; Dijk, Aalt D.J. van


A better understanding of genomic features influencing the location of meiotic crossovers (COs) in plant species is both of fundamental importance and of practical relevance for plant breeding. Using CO positions with sufficiently high resolution from four plant species [Arabidopsis thaliana, Solanum lycopersicum (tomato), Zea mays (maize) and Oryza sativa (rice)] we have trained machine-learning models to predict the susceptibility to CO formation. Our results show that CO occurrence within various plant genomes can be predicted by DNA sequence and shape features. Several features related to genome content and to genomic accessibility were consistently either positively or negatively related to COs in all four species. Other features were found as predictive only in specific species. Gene annotation-related features were especially predictive for maize, whereas in tomato and Arabidopsis propeller twist and helical twist (DNA shape features) and AT/TA dinucleotides were found to be the most important. In rice, high roll (another DNA shape feature) and low CA dinucleotide frequency in particular were found to be associated with CO occurrence. The accuracy of our models was sufficient for Arabidopsis and rice (area under receiver operating characteristic curve, AUROC > 0.5), and was high for tomato and maize (AUROC ≫ 0.5), demonstrating that DNA sequence and shape are predictive for meiotic COs throughout the plant kingdom.