Homoplasy corrected estimation of genetic similarity from AFLP bands, and the effect of the number of bands on the precision of estimation

Gort, G.; Hintum, T.J.L. van; Eeuwijk, F.A. van


AFLP is a DNA fingerprinting technique, resulting in binary band presence–absence patterns, called profiles, with known or unknown band positions. We model AFLP as a sampling procedure of fragments, with lengths sampled from a distribution. Bands represent fragments of specific lengths. We focus on estimation of pairwise genetic similarity, defined as average fraction of common fragments, by AFLP. Usual estimators are Dice (D) or Jaccard coefficients. D overestimates genetic similarity, since identical bands in profile pairs may correspond to different fragments (homoplasy). Another complicating factor is the occurrence of different fragments of equal length within a profile, appearing as a single band, which we call collision. The bias of D increases with larger numbers of bands, and lower genetic similarity. We propose two homoplasy- and collision-corrected estimators of genetic similarity. The first is a modification of D, replacing band counts by estimated fragment counts. The second is a maximum likelihood estimator, only applicable if band positions are available. Properties of the estimators are studied by simulation. Standard errors and confidence intervals for the first are obtained by bootstrapping, and for the second by likelihood theory. The estimators are nearly unbiased, and have for most practical cases smaller standard error than D. The likelihood-based estimator generally gives the highest precision. The relationship between fragment counts and precision is studied using simulation. The usual range of band counts (50–100) appears nearly optimal. The methodology is illustrated using data from a phylogenetic study on lettuce