I would love to do this, but my offer of moving on was contingent on us agreeing that it does not provide a suitable citation to support your case in chapter 3 of Adam and the Genome.
Please let me just summarise why this paper does not support your case.
The fact that the authors conclude from their data that there was not an out-of-Africa bottleneck is because they have data from both inside and outside Africa and they can compare the two sets of populations. It is the relative levels of diversity in the two populations that allow them to exclude an out-of-Africa bottleneck. They spell this out clearly even in the abstract (“The comparable p value in non-Africans to that in Africans indicates no severe bottleneck during the evolution of modern non-Africans”) They cannot exclude an earlier bottleneck using their data because they do not have genetic data for a population from which their African populations were derived. To do this they would, I guess, need to use ancient DNA from bones, and they were working before this was technically possible. Therefore you cannot claim that because they exclude an out-of-Africa bottleneck, they also exclude a bottleneck in the lineage leading to the African populations.
The authors estimate a long-term human effective population size of between 8100 and 18800. These estimates are based on present day numbers of segregating sites in the sample sequences, and estimates of mutation rate. This method assumes a fairly constant population size over time. No historical reconstruction of effective population size at different time-points in history is given. Thus this does not exclude a bottleneck.
The authors present a coalescent analysis for this region gave a mean estimate of time to the most recent common ancestor (MRCA) for this region of 1,356,000 years ago; and the 95% confidence interval was between 712,000 and 2,112,000 years ago. This is assuming a constant effective population size of 10,000. Using the approximation of @swamidass that the time to the coalescent of 4 alleles will be a quarter of this time, this means a bottleneck could have occurred between 178000 and 528000 years ago. And these figures do not include an adjustment in the light the point that I have made about rapid population growth after a bottleneck giving further reductions of these dates.
Therefore it seems to me very clear that this paper does not support your case.
I am puzzled as to why you are not willing to concede this rather minor point. After all, looking back over your previous posts in this discussion, your own understanding of the paper and its methods have clearly deepened during the course of our discussions, and your position has shifted somewhat.
You initially seemed to think that Zhao et al (2000) based their conclusions about effective population size on their coalescent analyses.[quote=“DennisVenema, post:87, topic:37039”]
Have a look at Table 5, which shows their data for the distribution of TMRCA values. This is the data and analysis they are basing their conclusions on. Bottlenecks increase the probability of coalescence (this is also how PSMC methods work). We see a distribution of TCMRA values for the alleles in the study. This is basically what a PSMC analysis does sequentially for an entire genome to get a much larger sample size.
I immediately showed that this was wrong, but you continued to believe this through-out most of our discussion until you finally re-read the paper.
You also thought that the method they used to calculate effective population size did not assume a fairly constant population size over time:
You were wrong on this point, so then you said:[quote=“DennisVenema, post:97, topic:37039”]
I guess I’m asking you to look at the TMRCA data there and think about your hypothesis (a bottleneck to two in the last few hundred thousand years).
You then appear to have made the mistake of thinking that a bottleneck could only have happened at the TMRCA. This is clear in the quote below, where you date any potential bottleneck at the TMRCA[quote=“DennisVenema, post:102, topic:37039”]
That study identified 75 variants in this region that have a minimum coalescence time of over 700,000 years. The mode is 1.2 million years, and 700,000 is the lower bound of the 95% confidence interval for the combined sample. So, how did all of that variation survive a bottleneck to two? It can’t. So, how did all of that variation arise after a proposed bottleneck to two? Through new mutations. How long would that take? Even with a steady-state population of around 10,000, about 1.2 million years.
You then made the mistake of suggesting recombination was unlikely to have occured much in a 10,000kb region[quote=“DennisVenema, post:109, topic:37039, full:true”]
Richard - are you aware how closely linked those variants are? They’re at most 10,000 bases apart. Are you seriously suggesting that they passed through a bottleneck en masse in two individuals and then recombined to the forms we see now?
I think that Joshua’s analysis has shown you to be wrong on this point.
You also suggested that the raw data presented by Zhao et al does not form clusters that could have been derived from four ancestral haplotypes:
By “eye-balling” in Excel and by drawing a haplotype network in Splitstree, I have shown this to be wrong.
You later expressed skepticism that three mutations could occur within a timeframe of a few hundred thousand years[quote=“DennisVenema, post:231, topic:37039”]
I’m still not seeing how you can fit everything you need into the timeframe you’ve allowed yourself. If there is a bottleneck to 2, every haplotype in the Zhao (2000) data set has to come from your four ancestral haplotypes. Why did you decide that three mutations was an acceptable deviation from those types? How do you have time for three mutations, each interspersed with drift?
However, the coalescent analysis of Zhao et al clearly shows that many cumulative mutations have occurred in this region in such a time-frame.
You also suggested that the low Ne after a bottleneck would reduce the numbers of mutations available[quote=“DennisVenema, post:231, topic:37039”]
Don’t forget that if you lower Ne to get a faster coalescence time, you also lower the number of forward mutation events that are plausible.
But I argued that if the population revered quickly, this effect would be small, and expansion to an Ne of over 10,000 would quickly allow far more mutations.
Thus, I am struggling to see how you can still think that this paper supports your case.
As we have been discussing the paper, you appear to have changed your position on when in history you believe a bottleneck has been shown to be almost certainly impossible. On page 55 of Adam and the Genome, you wrote:
"It seems our smallest effective population size over the last 18 million years was when we were already human, at around the time our ancestors left Africa…
All methods employed to date agree that the human lineage has not dipped below several thousand individuals for the last 3 million years or more – long before our lineage was even remotely called “human”.
You now seem to be saying in the current discussion that in fact you only think a bottleneck is excluded by the data in the last 200,000 years:[quote=“DennisVenema, post:247, topic:37039”]
A few questions for you - if we take as reasonable your suggestion that Zhao (2000)'s data coalesce to four haplotypes between 300,000 - 1,000,000 years ago, how does that help your case? In Adam and the Genome I consistently discuss humans as a species arising ~200,000 years ago. So, by your calculations, Zhao (2000) supports my case - human variation in this all region of the genome cannot be reasonably explained by a bottleneck to 2 individuals within human history, as I argue in AatG. Am I missing something here?
Yes, you do discuss humans as a species arising ~200,000 years ago, but you also say that “the human lineage has not dipped below several thousand individuals for the last 3 million years or more – long before our lineage was even remotely called “human””. Thus, it seems to me that a bottleneck between 300,000 and a million years ago would be a direct contradiction of the claim you make on page 55 of Adam and the Genome.
All in all, it seems to me that we have made considerable progress in our discussion of Zhao et al over the past weeks. It has helped us clear up several misunderstandings of the paper and of its methods. It has helped us all to think through how to think about a bottleneck in terms of a coalescent analysis. It also appears to have helped you to change your position expressed in Adam and the Genome – that a bottleneck could not have occurred in the last 18 or 3 million years – to a position that one could not have occurred in the last 200,000 years.
Given all this progress, it baffles me that you are not willing to now concede that Zhao et al does not support you case in Adam and the Genome, and is not therefore an appropriate citation. I honestly don’t think you have much to lose by making this admission. It is not as if you actually cited Zhao et al in your book. It is not mentioned there. Why not just admit that you were mistaken to cite it?
I also will struggle to contribute much to this discussion over the Christmas period. I will be reflecting on it though from time to time. I wish you and all other contributors and readers a very happy Christmas. Thanks for an interesting discussion so far, and helping me to try to answer my questions about bottlenecks.