EUCARPIA Biometrics

We were glad to welcome you to the XVIth Meeting of the EUCARPIA Section Biometrics in Plant Breeding. This meeting hosted scientific presentations on the development and application of quantitative methods and strategies in plant breeding. A major topic was the use of genome-wide marker data to predict phenotypic performance, coin­ed “genomic prediction”, in plants and crops with invited speakers from human and animal genetics. Furthermore, the meeting provided great opportunities to talk to leading experts and to meet researchers and breeders of other companies and organizations.

9-11 September, Wageningen, The Netherlands


Download Group Photo


    Abstracts of invited speakers

    Peter Visscher (Centre for Neurogenetics & Statistical Genomics, University of Queensland, Australia) - Genomics and big data in human populations: combining genetics and epigenetics to predict phenotypes


    Driven by advances in genome technologies, the last 8 years have witnessed a revolution in our understanding of complex trait variation in human populations. Results from genome-wide association studies and whole-genome exome studies have shown that the mutational target in the genome for most traits appears to be very large, such that many genes are involved in explaining genetic variation. Genetic architecture, the joint distribution of the effect size and frequency of variants that segregate in the population, is becoming clearer and differs between traits. We will show new results from disparate complex traits including height, schizophrenia and gene methylation, to illustrate polygenicity and the power of experimental sample size. In addition, we will show emerging results that epigenetic information can be used to make predictions of complex traits and that gene methylation can be a predictor of past environmental exposures.

    Neil Hausmann (Dupont Pioneer, United States) - Future Breeding Systems: view from DuPont Pioneer

    Future breeding systems in DuPont Pioneer will be built upon development and deployment of predictive analytics capabilities explicitly integrating genomic, environmental, crop management and trait phenotypic knowledge. Improved understanding of the genetic architecture of traits, leveraging available sources of genetic diversity, will enhance the breeder’s ability to utilise a range of model-based prediction methodologies to support the continually increasing scale of breeding programs. The accuracy of these predictions, especially when faced with complex traits displaying high levels of genotype by environment variability, will depend upon successful integration of quantitative genetics methodology, pedigree understanding, computer simulation and functional gene-to-phenotype models. The future breeding system require an adaptive field evaluation infrastructure that must meet needs of wide area testing while providing detailed phenotyping required for model parametrization. Examples will be drawn primarily from our experience gained through commercial maize breeding.

    John Hickey (University of Edinburgh, United Kingdom) - Sequence to Phenotype: Allocation of Resources


    Background.  Genomic selection is increasingly valued within the plant breeding community. To implement genomic selection large investments are needed in genomic data (markers and or sequence) and phenotypic data on which to train prediction equations. Choices about distributing these resources affect the return on investment.

    Results.  A simulation was conducted which evaluated the long term benefit of three alternative breeding program designs: (i) a classical plant breeding program design; (ii) a minor modification to the classical design in which genomic prediction was used to increase the accuracy of preliminary yield trials; and (iii) a complete reorganization of the breeding program into a population improvement component driven by genomic selection and a product development component that was similar to i.

    Conclusions.  The different breeding program designs gave different returns on investment. Complete reorganization of plant breeding programs into population improvement components driven by genomic selection and product development components was promising but its benefit was affected by costs.

    Emma Huang (CSIRO Computational Informatics and the Food Futures National Research Flagship, Australia) - Meta-alleles in multiparental populations


    Multiparental populations have become increasingly popular in plant breeding due to their high genotypic and phenotypic diversity. In particular, MAGIC populations, which mix the genomes of multiple founders through several generations of recombination, offer relatively high resolution and power for investigation of many traits simultaneously.

    Typically, models for QTL mapping in such populations follow two approaches, testing association either with the observed marker genotypes or the unobserved founder genotypes. If there is a single causal variant and it is genotyped, or is in strong linkage disequilibrium with a genotyped marker, then the first, simpler model is the most powerful possible test. The second, full model, allows each founder of the population to have a different effect, thereby allowing for multiple causal variants. However, it may be over-specified since it is unlikely that all founders have different effects.

    Models intermediate in complexity that elucidate the number of distinct functional alleles should better represent the true genetic architecture of the trait, particularly in testing for interactions, where the number of effects in the full model can quickly outnumber the size of the population.

    We consider here three approaches to collapsing founder alleles into ‘meta-alleles’. The first, based on clustering haplotypes in sliding windows based on genomic similarity was proposed by Leroux et al. (2014). This data-driven approach was shown to have highest impact in a scenario with a huge number of medium/small-size families. We propose two alternate approaches with biological interpretations of the meta-alleles which may be more appropriate for MAGIC populations. One determines the set of distinct isoforms of each protein encoded by the founders of the population. These “protein alleles” are used to cluster the founders. The other clusters founders based on time to the most recent common ancestor.

    We compare all three approaches to the simple and full models through application to a four-parent wheat and 19-parent Arabidopsis MAGIC population. Further, we perform simulations based on the Arabidopsis population to quantify the gains achievable through use of these methods.

    Jens Riis-Jacobsen (CIMMYT) - Accelerate genetic gain by taking advantage of additional data sources and integrated data analysis – case studies from maize and wheat breeding at CIMMYT


    Background: Exploitation of plant genetic resources is dependent on germplasm related data being transformed into useful information that supports decision making and enhances genetic gain. Traditional data sources in the breeding work are genealogy and phenotypic data, but in recent years additional sources such as genotypic data, environmental data, and different sensor data have become available at a low cost. Nevertheless so far, only the largest breeding companies have managed to take advantage of the new sources of data in integrated informatics and analytics platforms, and the majority of organizations involved in plant breeding are struggling with how to harness this potential. Taking the maize and wheat breeding programs at CIMMYT as the point of departure this paper analyses how a small to medium sized breeding institution can take advantage of new data sources, what benefits they may obtain, and what some of the challenges involved are.


    ·         Genealogy and phenotypic data remains the foundation data for crop genetic improvement, and with available tools it is possible to setup the core elements of a future integrated breeding information system

    ·         Genotyping, climate, and remote sensing data can make valuable contributions in a plant breeding program as has been demonstrated with ad hoc studies, but tools that facilitate mainstreaming of this in plant breeding programs are not generally available

    ·         While the informatics and biometric challenges in an integrated breeding platform are being addressed, plant breeding institutions will still be faced with challenges related to establishing a multidisciplinary team as well as change management capabilities that can implement the solutions

    : New data sources and new analytical capabilities like high throughput phenotyping can accelerate genetic gain, and plant breeding programs that incorporate these may benefit from added productivity. Informatics and biometric solutions are increasingly available, which will lower the barriers for using integrated approaches. However, the full potential will only be realized when breeding activities and investments are reorganized and smaller breeding programs may also struggle to access the broader set of competencies required.

    Alison Smith (University of Wollongong, Australia) - Experimental designs for expensive multi-phase traits

    The importance of sound experimental designs for plant breeding trials cannot be underestimated. They are crucial to ensure valid inference and accurate prediction of genetic effects, whether they be effects for traditional or genomic selection or the identification of QTL. Many key traits involve multi-phase experiments, where grain samples are taken from a field experiment (Phase I) then processed further in one or more laboratory experiments (Phase II and higher). Typically the laboratory phases are costly relative to the field phase and this necessitates a limit on the total number of samples that can be tested. Historically this has been achieved by sacrificing field replication and testing a single composite sample for each variety, obtained by combining grain from all field replicates. Typically no replication or randomisation is employed in the laboratory phases. In this talk we describe the approach of Smith et. al. (2014) in which replication is achieved in all phases of the experiment. In terms of field replication, some varieties are tested as composite samples and some as individual replicate samples. Replication in the laboratory is achieved by splitting a relatively small number of field samples into sub-samples for separate processing. Model-based design techniques are used to obtain efficient designs for the laboratory phases, conditional upon the field design. Unlike the historical approach, this method allows the application of an efficient statistical analysis to the resultant data so that accurate predictions of genetic effects may be obtained.

    The approach will be illustrated using an Australian wheat quality project that involved a series of field trials and subsequent measurement of a range of flour, dough and end-product traits.

    A major challenge with this project was to develop experimental designs and protocols that were not only statistically valid, but also satisfied strict budgetary constraints and were pragmatic, in the sense of complying with standard laboratory practice. We show how all of these issues were successfully addressed using the approach of Smith et al. (2014).

    Andres Gordillo (KWS) - Genomic selection strategies and validation in hybrid maize and rye

    Genomic selection (GS) schemes can be classified according to the relationship between the training population (TP) and prediction population (PredP). GS schemes implemented in AgReliant Genetics’ maize breeding program and KWS LOCHOW’s rye breeding programs exemplify contrasting types of PredPs and different relationships between TP and PredP. In the hybrid maize GS scheme, large numbers of doubled haploid lines are developed within biparental populations. A portion of the lines is used as TP to predict untested lines from the same biparental population. Multi-environment validation experiments showed that prediction ability for grain yield is somewhat higher for lines in the TP and slightly lower for untested lines compared to phenotypic selection. Index selection using phenotypic and genomic grain yield values was consistently better than phenotypic or genomic selection alone (Krchov et al., 2015). Herein, GS allows predicting lines with insufficient seed set that otherwise would be delayed in their testcross evaluation by one year, which saves time and simplifies logistics. In the hybrid rye GS scheme, the TP includes independent populations tested in previous years. Validation experiments indicated that the prediction ability for grain yield was considerably higher for phenotypic selection than for GS when predicted lines were not included in the TP and GS was equivalent to phenotypic selection when predicted lines were included in the TP. This indicates a strong influence of the genetic background on estimates of marker-allele effects. Index selection using phenotypic and genomic grain yield values leads consistently to higher prediction abilities than phenotypic or genomic selection alone. Predictions based on TPs from previous selection cycles indicate a drop in prediction ability in the years 2013 and 2014. Validation experiments show that prediction abilities across selection cycles depend on the proportion of parents and uncles of predicted lines in the TP. Per-se selection and variation in agronomic traits indirectly influencing yield may affect prediction ability. Using data from the evaluation of lines in multiple years and appropriate modelling of SNP by year effects as well as including parents of the PredP in the TP may be key to increase prediction abilities of GS across selection cycles.

    Hans Peter Piepho (Biostatistics Unit, Universität Hohenheim) - The generation of efficient row-column designs for field trials

    When generating experimental designs for field trials laid out on a rectangular grid of plots, it is useful to allow for blocking in both rows and columns. A potential problem with such designs is that occasionally treatment replications may be clustered in the field layout. This talk reviews strategy to avoid such clustering. When the design is resolvable, separation can be enhanced by latinizing or t-latinzing the design. When the design is non-resolvable, latinization is not possible, and randomized classical row-column designs may occasionally involve clustered placement of several replications of a treatment. In this talk we illustrate how a spatial variance-covariance structure can be used to achieve a more even distribution of treatments across the field and how such designs compare with classical row-column designs in terms of efficiency. We consider both equally and unequally replicated designs, including partially replicated designs.

    Luc Janss (Aarhus University) - Genomic analysis in tetraploid potato using genotyping-by-sequencing

    Standard genotyping systems are not suitable for tetraploids. The standard systems only distinguish one heterozygote, while tetraploids have three. We consider here the use of genotyping-by-sequencing (GBS) to genotype tetraploid potato. GBS is used in a quantitative way to estimate the allele-dose of a genotype. These allele-dose estimates have estimation error from using finite read-depth on a genotype, which equals in general (1-1/P)/D, where P is the ploidy level, and D the read-depth. Tetraploids have a lower homozygosity rate, which reduces the genetic variance by a factor 2 compared to diploids. This was indeed reflected in the GBS estimated genotypes in our potato genotypes. Genomic heritability estimates for some traits are presented using genomic relationship matrices based on GBS estimated genotypes corrected for error from low read-depth.

    Dave Marshall (The James Hutton Institute) - The data challenges from the application of high throughput technologies in plant breeding and genetics

    As new high through put technologies for sequencing, genotyping and phenotyping begin to impact on plant genetics, genetic diversity  and breeding applications there is increasingly a need to develop and deploy the computational  tools and  infrastructure to deal with the resulting high volumes of data at every stage in the pipeline from generation to short and long term storage as well as  support  interactions with analysis and visualization tools.  Some of these challenges are simply ones of  resourcing i.e providing sufficient computing power either locally of through cloud solutions. However, in many existing software tools will not scale easily and in many plant breeding applications there may be a need to pass from data generation to analysis and decision- making in very short time frames.  At local, national and international scales there are a number of developments such as such as the US funded iPlant project ( which is now expanding into Europe to support analysis and the developing  Plant Breeding API project ( which is working towards the provision of a common API that can be used to integrated a variety of software tools and  data sources in the domain of plant breeding and genetics.  In this talk I shall discuss some of the major challenges that we face together with current or developing solutions where they exist.

    Marco van Schriek (Keygene ) - Exploitation of digital phenotype markers for prediction of brassica napus field seed yield


    Controlled phenotyping platforms or conventional pot trials are often used to compare plant performance under water stress and non-stressed conditions. Studies which compare stress reactions of plants under these controlled conditions with performance in the field are rare. Correlations between pot experiments and field trials are essential in order to identify and exploit morphological or physiological selection criteria for practical breeding approaches. To this end a selection of diverse winter oilseed rape cultivars known to show variable stress responses in the field were screened. All cultivars we grown in irrigated and non-irrigated field trials at multiple locations in Germany. The same cultivars were also grown under water-stressed and non-stressed conditions in two controlled experiments. Firstly a container experiment where the experiment was performed over a complete growing season so that seed yield could be measured. For the second experiment the cultivars were tested by digital phenotyping using the PhenoFab system. I will present correlations between early digital phenotypes and field yield-relevant parameters observed in this study.