Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1)

Question: Why is this an accurate simulation for haplotype distributions? A Gaussian distribution is essentially pure noise except for the mean and the variance. @DennisVenema is claiming that the existence of TMRCA values > mya is a legitimate signal of ancestry, not just noise. If the observed PDF across the genome is the result of a stochastic process that makes some regions seem younger than the ground truth (as measured by TMRCA), then the noise should not be interpreted as nullifying the signal from regions at extrema that point to a very ancient TMRCA.

Does that make sense?

I don’t know what you mean by correcting for interbreeding here. If many of us are descendants of ancient interbreeding between lineages that diverged at least half a million years ago (which we are), then we are not a product of a bottleneck of size two within the last half million years. To take seriously a recent tight bottleneck, you don’t have to correct for introgression – you have to ignore it.

I’ll post more later. This is just regarding @DennisVenema’s claim re: Homo sapiens. If we care about measuring Homo sapien population size, we have to correct for interbreeding.

The good news is that I found exactly the data we need to do this study. It includes a scan of phylogenies constructed across the entire autosomal genome. Each phylogeny is computed for non-recombining blocks, which corrects for recombination introduced artifacts. From this we can compute TMRCA and T4MRCA across the whole genome, by looking at the times of the first and third coalescent nodes. Once we rescale to years with the mutation rates, we will have the distribution across the whole genome.

  1. We can test to see how well TMRCA / 4 = TMR4A.
  2. We can see the distribution of TMR4A across the genome.
  3. We can identify the outlier areas, or subset on any region of the genome.

The bad news is that the source data is 424 GB of compressed data. I got a plan to handle it though. If that is a good analysis to do?

What is everyone’s thoughts? (especially @glipsnort, @DennisVenema, and @RichardBuggs) Not expecting a response till Jan of course.

I have to admit that I haven’t looked at the precise wording of Dennis’s statements lately, but my impression was that his claims were not about “organisms arbitrarily labelled as Homo sapiens” but about “us”, which would correspond to contemporary Homo sapiens. Contemporary Homo sapiens did not go through a bottleneck of size two 200,000 years ago. My ancestors did not go through a tight bottleneck 200,000 years ago. At that time, some of them were living in Africa and some of them were living in Europe. I still don’t see why the demographic history of one branch of the structured population I descend from has anything other than purely academic interest to anyone. The fact that many (but not all) biologists have slapped different names on the different branches doesn’t strike me as relevant at all.

Josh, would you agree with this statement based on your understanding of the evidence?

I went through the first 290 posts looking for any relevant statements by @DennisVenema. I don’t see any that can be construed as saying that a bottleneck to two in Homo Sapiens (as opposed to “our lineage” including interbreeding) is ruled out with certainty comparable to heliocentrism. This is getting pretty nitpicky, but if it aids the discussion then good.

I have not read every single post, but I have read with interest the various expert opinions, and for what it is worth, I give my impression.

The certainty that is displayed by one side seems to be the absence of modelling that would support a “bottleneck” of two (I find it hard to believe two creatures are a bottleneck? but this is the terminology).

The questions that are asked however point to a “lack of similar certainty” regarding what I think are important parameters in the modelling.

I (as a non-participant) am at a loss as to the physical evidence that is supposed regarding this bottleneck of a few hundred or whatever number is discussed. Am I missing something? Has someone discovered remains at a particular location to show such a group existed at some point in time? If not, is this not an inference derived from the modelling. If there is direct physical evidence, why is this not discussed at length, as it would be relevant to the certainty/uncertainty aspect of the discussion?

Yes - one can use considerable expertise and ingenuity to argue whether the population genetics model supports a particular contention or not. That’s the nature of this thread.

But the question of whether the model itself is adequately valid over such timescales, given its known limitations and the state of flux of theories of large-scale evolution, is a significant one.

“All models are wrong - some are useful”. But their utility is only measurable by the ability to validate them by independent observation under the situation for which they are being used - in this case the origin of humanity defined, at least, as our species or even across hominin species by some protagonists. That’s very different from studying the evolution of Y-chromosomes in the living population.

In this case, validation would seem to require counting fossils that are as rare as hens’ teeth - in the absence of physical evidence, the population genetics model seems to validate itself in a circular manner.

AFAIK, it’s basically the same thing, but on a longer time scale and across more of the genome than typical Y-chromosome studies.

Perhaps one of our biologist friends like @Swamidass, @glipsnort, @DennisVenema or @RichardBuggs would be able to shed more light.

Perhaps you would elaborate; what predictions can any model make on what seems to be current data used to set up the model itself?

(Chris Falter) #386

(GJDS) #387

Validation of models, as I have practiced (and is commonly understood) requires a result from the model to be similar to an observation/measurement of a system independent of the model data base. Within this I have a difficult time noting such validation of the models discussed here. Since a major point is the size and time of a bottleneck, for example, validation would be considered by using physical data of a real bottleneck. There may be other ways, and if you can identify them I would be interested to know.

This is not a question on the technicalities of the modelling proceure.