Dataset QTLMAS2009

The dataset consists of 2,025 individuals from two generations. All individuals have complete marker information. There are 453 SNP marker loci which are randomly distributed over 5 chromosomes. Each chromosome is approximately 1 Morgan in length. The first 25 individuals are parents, 20 female and 5 male. The remaining 2000 individuals are offspring, 100 full sibs (FS) families, one from each combination of a male and female parent. Each FS family has 20 offspring.

Fifty FS families have been phenotyped, the other 50 FS families do not have phenotypes. Phenotypes were recorded at multiple time points. The phenotyped FS families are chosen such that each female parent has at least 40 phenotyped offspring while each male parent has 100 phenotyped offspring.

The dataset is divided into four files.

Phenotype file

Phenotypes.csv contains 6 columns corresponding to:

  • Individual_ID, Trait_value_time0, Trait_value_time132, Trait_value_time265, Trait_value_time397, Trait_value_time530

The 5 trait values are measures of yield at 5 different times in the production period. These yield values could be seen to represent weight during the growth of an animal or biomass during the growth of a crop. (Additional information: The asymptotic values of individuals' yield range from 14 to 66).

Pedigree file

Pedigree.csv contains 3 columns corresponding to:

  • Individual_ID, Parent1_ID, Parent2_ID

The first column identifies the individual. The second and third column identify the female and male parents of each individual.

Haplotype file

Haplotypes.csv contains 454 columns on the phased haplotypes:

  • Individual_ID, M1, M2, … , M453 (M1 = marker 1, M2 = marker 2, etc.)
  • The file contains 2 haplotypes per individual on separate lines (line 1 = maternal; line 2 = paternal)

Map file

Map.csv contains 3 columns corresponding to:

  • Marker_ID, Chromosome_ID, Position_M
  • The marker positions are given in Morgan


The phenotyped FS families can be used to detect QTL and/or to train a model for genomic selection. The remaining (not-phenotyped) FS families are the validation set to predict breeding values using genomic selection methods.


QTL detection

The dataset is simulated to allow the 50 phenotyped FS families to be used for QTL detection (by association, linkage or combinations thereof). For comparison of results we ask you to report the estimated positions and explained variances for QTL that affect the phenotypic trait in this population.

Breeding Value Prediction

No phenotype is given for 50 FS families such that these can be used for prediction of breeding values with marker data. For comparison of results we ask you to report the predicted breeding values of all 1000 non-phenotyped FS individuals on time600.


