Publicaties

Comparing genomic prediction and GWAS with sequence information vs HD or 50k SNP chips

Veerkamp, R.F.; Binsbergen, R. van; Calus, M.P.L.; Schrooten, C.; Bouwman, A.C.

Samenvatting

Earlier work showed that using whole genome sequence information did not improve the accuracy for genomic prediction, because there are too many SNPs in close LD to pinpoint the functional SNP accurately. In this study we therefore compared the single SNP GWAS results using imputed whole genome information (from run4 of the 1000 bull genomes project) to the GWAS results obtained using either the 50k or777k HD SNP chips. For the analysis, (imputed) HD genotypes were available on 5,549 Dutch bulls, of which 3,416 were used for GWAS and subsequent training. Single SNP GWAS was performed using the genomic relationship matrix (based on HD) to account for population structure. In the sequence information there were 28,076,109 SNP imputed, but 10,258,688 where monomorphic in our training population. For protein yield 2,241 SNPs were significant (-log10(p)>5), and 28 (160) of those were present on the 50k (HD) SNP chips. For somatic cell score the equivalent number of SNPs were 1,545, 90 and 7 using sequence, HD or 50k, and for ‘interval first to last insemination’ the equivalent number was 952, 27 and 4 SNPs, respectively. Fitting all SNPs together (using Bayesian Variable Selection) the HD and 50k SNPs gave clear evidence for QTL, but using sequence information the signal was spread across many SNP in high LD. In conclusion, although more significant SNP were found using sequence information, relatively few new regions were identified, and every significant SNP was accompanied by several others in high LD. Therefore, to benefit from sequence data in genomic selection, more sophisticated methodology is required than currently used for genomic prediction.