Human Chimp Genome Similarity

Short answer: only one of the numbers that’s been tossed around poses any real challenge to the standard evolutionary model, and t’s not one anyone has paid any attention to. It’s also quite likely wrong.

To compare the expected amount of genetic difference between humans and chimpanzees with what we see, we have to know several things: the (selectively neutral) mutation rate per generation, the average length of a generation in both lineages, the time when the two lineages diverged, and the effective size of the ancestral population before the split (a larger population means more genetic diversity in the population, and more differences that get randomly shuffled into the two lineages). We have some estimates for all of these quantities, and most of them are quite fuzzy, so there’s lots of uncertainty in the expected value.

One key value is the mutation rate. Now, there are different kinds of mutation involved here, with different mutation rates. We can break them down into three basic classes: single-base substitutions (one base mutates into another), small insertions and deletions (‘indels’), and large insertions and deletions (‘structural variation’).

Substitutions are the easiest to assess (just look for individual different bases in otherwise identical sequence), the most common, and probably the ones that contribute the least to the total difference. It also happens to be the only class of mutation for which we have a reasonably good estimate of the mutation rate without assuming common descent. In the chimp genome paper, the rate of single-base differences between humans and chimps was estimated to be 1.23%. Given current estimates of the mutation rate, this is consistent with common ancestry, although it’s on the high end of what might be expected, and implies a fairly old separation (say, 7 million years ago rather than 6) and a large ancestral population size.

Small indels were estimated in the chimp genome paper to represent ~1.5% of unique human sequence, more than single-base differences even though there are fewer individual differences (since a single indel can involve many bases). I don’t know of a good estimate for the rate of new indels, but the observed number of differences is in the right ballpark, based on the number we see when comparing two humans.

Structural variation is the hardest to assess, and the category for which we have the least knowledge about mutation rates. It may well represent the largest contribution to the overall difference (anywhere from “more than 0” to 5% strikes me as plausible), but it tells us just about nothing about common descent, since we really don’t know how much to expect. This is why I don’t find estimating the overall divergence something of interest.

The one estimated divergence that would cause trouble for common descent is the estimate of the single-base divergence from the paper @RichardBuggs quoted, and which I responded to yesterday. Their value was 1.93%, rather than 1.23%. While the absolute difference between the two estimates is small, the later one is 50% greater than the original one. 1.23% requires old divergence and large ancestral population, and 1.93% would be difficult to accommodate with plausible values. However, the higher value is also likely wrong. The paper in question notes that their program is optimized for speed, not sensitivity and is less sensitive than another program, LASTZ, that is commonly used. When @RichardBuggs came up with his own estimate of the single-base divergence, based on public data and LASTZ, he got values of 1.12% and 1.24%, depending on how he did the comparison. Since the underlying data in this recent paper was basically the same data Richard used, it’s fair to conclude that the software in the paper is responsible for the difference. It’s a good illustration, though, of how dangerous it is to draw sweeping conclusions based on imperfect estimates of divergence.

5 Likes