Breeding Data: Statistical Advances in Modern Plant Breeding

In the process of breeding, big data are generated such as phenotype, sequence, pedigree data etc. Statistics can offer ways to exploit those data in order to accelerate the breeding process as well as to understand the underlying biological mechanisms.

In the intersection of Statistics with Genetics and Agriculture, lie many scientific endeavours. How to model genetic versus non-genetic variation in large field trials? How to detect QTL for complex traits? How to use genome-wide molecular markers to predict the phenotype of unphenotyped individuals? How to model complex inheritance patterns in polyploids? How to integrate plant physiology and big phenotype data with genetics?

This one-day symposium will be an opportunity to discuss
those questions having the input of top-researchers.

Organised by Mathematical and Statistical Methods - Biometris

Tue 16 October 2018 09:00 to 17:30

Venue Orion, building number 103
Room C1005
Price description Free

Speakers | Topics

Hans-Peter Piepho | Experimental design

Experimental design and heritability for field trials in plant breeding

In this talk I will first review experimental designs as currently used for field trials in plant breeding. Subsequently, I will consider how data from individual trials using these designs are analysed and how data are integrated across environments. Particular attention will be paid to comparison of the two commonly used options, i.e., single-stage analysis and stage-wise analysis. The talk will end with a review of methods to estimate heritability for multi-environment trials and a discussion of how these methods do, or do not, reflect the experimental design. The main objective of the talk is to convince the audience that (i) good experimental design comes before analysis and no statistical analysis, no matter how sophisticated, can make up for poor design, (ii) that stage-wise analysis, if done properly, i.e., if properly taking the experimental design into account, is perfectly fine and that (iii) standard equations for heritability are almost always inappropriate for multi-environment trials occurring in breeding practice but there are simple alternatives that work and have a good theoretical foundation.


Hans-Peter Piepho was appointed Professor of Biostatistics at the University of Hohenheim, Stuttgart, Germany in 2001. He has been working as an applied statistician in agricultural research for almost 20 years. His main interests are related to statistical procedures as needed in plant genetics, plant breeding and cultivar testing. Recent interests include marker-assisted breeding (genomic prediction), spatial methods for field trials, experimental design for various applications including RNAseq and series of experiments, and network-meta analysis.


Damesa, T., Worku, M., Möhring, J., Piepho, H.P. (2017): One step at a time: Stage-wise analysis of a series of experiments. Agronomy Journal 109, 845-857.

Piepho, H.P., Michel, V., Williams, E.R. (2016): Nonresolvable row-column designs with an even distribution of treatment replications. Journal of Agricultural, Biological and Environmental Statistics 21, 227-242.

Piepho, H.P., Möhring, J. (2007): Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177, 1881-1888.

Williams, E.R., Piepho, H.P., Whitaker, D. (2011): Augmented p-rep designs. Biometrical Journal 53, 19-27.

Andres Legarra | Insight from Animal Breeding

The Four Horsemen Of Genomicalypse: Fuzzy notions in genomic selection

Here I present a non-Final Judgment on the theory supporting Genomic Selection. The number of publications concerning genomic selection totals > 2000, yet to my taste there are many unsolved questions. In particular, in spite of great efforts, current theory is unable to explain and predict accuracy from a priori parameters such as pedigrees of to-be genotyped individuals, heritability, and LD statistics. The role of close vs. far relationships is particularly poorly understood.

In order to understand how genomic selection works, a few concepts need to be clarified. Although these concepts are present in most researchers’ heads, the lack of a clear formalization leads to barriers for sharing, understanding and development. This is my brief, personal review on such four topics. 


Andres Legarra is Research Director at the INRA centre of Toulouse, France, in the Animal Genetics department. He works on genetic improvement of livestock with strong emphasis in genetic evaluation with phenotypes, pedigree and markers. Also he enjoys making incursions in pure quantitative genetics.

Marco Grzegorczyk | Graphical models / Systems Biology

Learning the circadian clock network in A. thaliana from gene expression data

The global challenges of guaranteeing food security in an expanding human population have led to an increased interest in understanding the molecular processes underlying biomass production in plants. A potential long-term application is to improve the yield of crops. The process of photosynthesis allows plants to utilize sunlight to produce essential carbohydrates during the day. However, the earth 's rotation predictably removes sunlight, and hence the opportunity for photosynthesis, for a significant part of each day, and plants need to orchestrate the accumulation, utilization and storage of photosynthetic products in the form of starch over the daily cycle to avoid periods of starvation, and thus optimize growth rates. Plants therefore have evolved biological clocks – an endogenous circadian timing system that controls daily rhythms in transcriptional regulation and its control of metabolism – to adapt better to the 24 h period of the solar day. Hence, a challenge for the plant systems biology community is the further elucidation of the detailed structure of the circadian clock gene regulatory network.

In my presentation, I will describe how dynamic Bayesian network (DBN) models can help to predict regulatory networks from gene expression data. In the last decade, DBN models have become a very popular class of models for network learning in systems biology research. The main advantage of DBNs is that they are flexible models, which can be easily modified and adapted, so as to take special features of the underlying regulatory system into account. I will give an introduction to DBN models and an overview over different advanced DBN models. After having discussed there relative merits and shortcomings, I will apply DBN models to  gene expression data from the model plant Arabidopsis thaliana to predict the structure of the underlying circadian clock network.


Since 2013, Dr. Marco Andreas Grzegorczyk is an assistant professor in the research unit ‘Statistics and Stochastic Studies’ at the Bernoulli Institute of Groningen University (Netherlands). His main research interest is on developing advanced novel Bayesian networks models for learning gene-regulatory networks and protein pathways from postgenomic data. Inferring the unknown topologies of cellular networks is one of the ultimate objectives in the topical field of systems biology. In the last decade he has developed various tailor-made network reconstruction methods for different applications, including the circadian clock network in the plant Arabidopsis thaliana.

Ian Mackay | Multiparental populations

MAGIC populations: beyond trait mapping

Multi-parent advanced generation intercross (MAGIC) populations have been developed in many crops and are widely used in trait mapping. Their method of construction reduces population structure and variation in kinship among the recombinant inbred lines derived from the population, while still capturing a substantial proportion of the genetic variation in the population from which the founders were selected. These properties enable MAGIC populations to be used in applications other than trait mapping. We discuss the use of MAGIC populations directly in breeding as source material for genomic selection schemes; in organic and participatory breeding programmes; and in the study of heterosis and transgressive segregation.


Ian’s principal research interest is in quantitative genetics applied to plant breeding. He has published on experimental design, selection methods, improved approaches to trait mapping, and genomic selection. In December 2017, Ian established IMplant Consultancy Ltd., consulting in quantitative genetics and breeding.  Before this, he worked at NIAB (Cambridge, UK) for 12 years, ran the Statistical Genetics department of drug discovery company Oxagen Ltd.  for six years and worked as a commercial plant breeder for 19, including nine as cofounder and research director of the company Lion Seeds Ltd. He is professor of plant breeding at SRUC, Edinburgh.

Patricio Munoz | Polyploids

Effect of Allele Dosage in Autotetraploid Genomics. Blueberry as a model

Recognized as a process of plant evolution, polyploidization, impacts the diversification of natural populations and the speed at which plant breeding can concentrate favorable alleles, progress. Polyploidization occurs in many important crops (e.g. alfalfa, blueberry, potato), however, its complexity has prevented the use of proper models for genomic analysis. In this talk I will illustrate, using a population of blueberry as a model, the effect that dosage can have in the capacity to discover QTLs in a GWAS analysis as well as the effect on the prediction ability on GS models. Current available software to call dosage and the most important challenges of applying genomics in polyploids will be discussed.


After graduating as a Forest engineer (2004) Patricio worked for Forestal Mininco S.A. (Chile) as a breeding assistant for a couple of years. He subsequently obtained a MS in quantitative genetics (2009) and a PhD in molecular breeding (2012) at the University of Florida (USA) both focused in breeding challenges. He then led the Forage Breeding and Genomics Lab from June 2013 until January 2017, after which he started leading the Blueberry Breeding and Genomics Lab program.

Charlie Messina | Crop growth models

Crop Science: Scientific foundation for prediction in breeding and agronomy

Crop science underpins significant technological developments that are instrumental for the improvement of the life of the world’s poor and society in general. Crop growth models (CGM) are the quantitative synthesis of scientific understanding of crop growth and development. As such, they enable prediction to prevent and address problems in breeding and agronomy.  While CGMs vary considerably in structural complexity, simple mechanistic models (SMM) proved useful for applications in breeding. The framework of an SMM is applied to the study of the long-term consequences of selection in maize on adaptive traits, and the physiological underpinning of improved drought tolerance. An example is provided for the use of SMMs as algorithms to leverage trait understanding in genomic prediction in maize.


Dr. Charlie Messina is a senior research scientist at Corteva Agriscience. His major research interest includes the fusion of biological models and statistical learning methodologies to increase the limits of predictability and response to environmental change in agricultural systems.

Call for abstracts:

You are invited to submit an abstract (200 words) of your research that falls within one of the following topics:

  • Experimental design
  • QTL detection
  • Genetic diversity
  • Genomic prediction
  • Imputation
  • Multiparental populations
  • Polyploids
  • Crop growth models
  • Phenotyping platforms
  • Biological networks

The selected abstracts will be presented as posters and flash presentations.

Submit your abstract via email to

Free registration

Registration includes lunch.

To help us better organise the event, we kindly ask you to register only if you are planning to attend.

(if you are experiencing problems with the registration form, click here)

Your Data