Genetic Similarity and Differences

Arun takes exception to the following in the Big South Asia genomics paper:

While the earliest 450 group of samples (SPGT) is genetically very similar to the Indus_Periphery samples from the 451 sites of Gonur and Shahr-i-Sokhta, they also differ significantly in harboring Steppe_MLBA 452 ancestry (~22%).

What does that "similar but different" mean, he asks.

I have a couple of ideas (and I can't currently use his comment system), so here goes. All human DNA is highly similar. Numbers like 99.5% identical are often mentioned, so comparing similarities and differences depend on the relatively few sites (millions, actually, out of billions) where differences occur, in this case, the so-called Single Nucleotide Polymorphisms (SNPs) or single site on the genome where more than one of the four possible nucleotides is found, that is, places where Joe might have a Guanine or G, and Fred might have a Thymine, or T.

Thus, detecting similarities to ancestral populations depends on finding clusters of SNPs originally found only, or almost only, in a single candidate ancestral population. Suppose X contains SNPs found in candidate ancestral population A and some of those found only in ancestral population B but few or none of those found in ancestral candidate population C. Then it is reasonable to assume that X is a mix of A and B, but not C. Suppose a later population Y includes a selection of the SNPs found in X and some found only in ancestral C, then Y is either of mix of A, B, and C or a mix of X and C.

The word similar usually means identical in some respects but not all. In this case that both populations show descent from the same two ancestral populations, but they are different in that one of them also shows descent from a third, different population.


