Hi Dr. Venema, @DennisVenema

I just purchased your book on the historical Adam and am trying to develop a basic intuition about population genetics (I’m a theoretical chemist by training). In particular, I’d like to get a back-of-the-envelope calculation to show why the number of SNPs in the human genome rules out a single human ancestral pair. But I’m having trouble showing that.

Imagine we start with a single human pair that reproduces to create a population of N=10,000 (to simplify matters, we can assume that all of these individuals are clones). The mutation rate produces about 100 single-point mutations in each new individual, so in the first generation of our N=10,000 population, there will be about 10^6 single-point mutations (assuming no overlap). After 100 generations, that will amount to about 10^8 single-point mutations (again, assuming no overlap, which still seems reasonable given that the human genome has a length of about 6*10^9). From what I gather, there are about 10^7 known SNPs, so we’ve easily achieved more than enough mutations in only 100 generations.

However, from what I understand, single-point mutations are only counted as SNPs if their variants exist in some significant fraction of the population (say, 1%). Although our N=10,000 person population now has more than enough single-point mutations, they often exist in only one individual, which would not be nearly sufficient for them to qualify as SNPs. So we now need to figure out how long it takes and how probable it is for a single point mutation to spread to 1% of the population.

But this process is essentially just a random walk with an absorbing boundary condition: if we begin with exactly 1 individual with a single-point mutation in a population of N=10,000, we want to know how probable it is to propagate to 1% of the population and how long it will persist at that prevalence. Some quick simulations showed that there’s around a .1% chance that a given mutation will propagate to at least 100 individuals (1% of the population) and will hang around at that prevalence for around 2000 generations. If I’m doing the math correctly, that means that of the 10^6 mutations produced at every generation, 10^3 of those will reach a prevalence of 1% and persist for 2000 generations. That works out to a steady-state of about 2*10^6 SNPs, which is only a factor of 5 smaller than what we observe today.

Is there something I’m doing wrong here? Is there a way to get a back-of-the-envelope result that shows an obvious conflict between a single ancestral pair and the number of modern SNPs?

Thanks,

Neil