Thank you so much for doing these analyses, Steve. I was hoping that my Nature Eco Evo blog would stimulate some studies that set out to explicitly test the bottleneck of two hypothesis, and this is certainly a big step in that direction.
As I begin to comment on this, I think I should say for those reading in who are not in the science world that Steve Schaffner is right at the top of the field when it comes to human genomics, and was one of the authors of the 1,000 genomes paper (and many other highly cited and very significant papers too). It is a real privilege to us all who are interested in this issue to have Steve running simulations on the two person bottleneck hypothesis, and to be taking the time to answer questions on it.
I would also note that the fact that we are discussing these new simulations is in itself very good backing for the point I made in my blog that more research is needed on this issue. It highlights how mistaken it is to declare that we can be as certain that there has not been a two person bottleneck as we can be that the earth rotates around the sun. After all, if I were to question the latter, no one would need to go away and do a simulation to come up with new evidence for it, in order to be persuasive.
Steve, I am very interested in your analyses. I had expected allele counts at polymorphic loci to be the biggest argument I would come across against the bottleneck of two hypothesis. I was not expecting an argument from allele frequency spectra. I am delighted to come across this possible way to test the hypothesis that I had not thought of, and that was not mentioned in Dennis' book chapter.
I am still going to take a bit of convincing that this is a good approach to testing the hypothesis, however. I will explain my reasoning below. I would underline that I know you see what you have done as just a preliminary study and you yourself are well aware of the approximations and simplifications that you have had to make. I will try to explain my points as simply as I can for our readers.
1) Steve has already highlighted that this approach depends heavily on a correct estimation of mutation rates, and the model presented assumes that these do not vary with time or in different parts of the genome. This may not be the case in reality.
2) Also, as far as I can see (Steve, do correct me if I am wrong), this approach depends on the assumption of a single panmictic population over the timespan that is being examined. I think it would be fair to say that there has been substantial population substructure in Africa over that timespan and that this has varied over time. To my mind, this population substructure could well boost the number of alleles at the frequencies of 0.05 to 0.2.
Let me just try to explain that in a way that is a bit more accessible to our readers. I am saying that Steve's model (at least in its current preliminary form) is making the approximation that there is one single interbreeding population that has been present in Africa throughout history, and that mating is random within that population. However, the actual history is almost certainly very different to this. The population would have been divided into smaller tribal groups which mainly bred within themselves. Within these small populations, some new mutations would have spread to all individuals and reached an allele frequency of 100%. In other tribes these mutations would not have happened at all. Thus if you treated them all as a large population, you would see an allele frequency spectrum that would depend on how many individuals you sampled from each tribe. It is more complicated than this because every-so-often tribes would meet each other after a long time of separation and interbreed, or one tribe would take over another tribe and subsume it within itself. Such a complex history, over tens or hundreds of thousands of years would be impossible to reconstruct accurately, but would distort the allele frequency spectrum away from what we would expect from a single population with random mating. It gets even more complicated if we start also including monogamy, or polygamy.
3) As far as I can see the model currently also assumes no admixture from outside of Africa. A group of people arriving in Africa from another continent would affect the allele frequency spectrum if they interbred, and if their non-African population had diverged from African populations. Obviously this could not have happened at time periods when there were no humans outside Africa. But the data under analysis is obviously of present day Africans after centuries of admixture from outside Africa. Steve may be able to account for this with a more complex model that excluded alleles that are common in non-African populations, although it would be hard to be completely sure about the origins of these alleles.
4) As far as I can see, the model currently assumes no selection. Natural selection will boost the frequency of beneficial alleles (and alleles linked to an allele being selected for). Especially relevant would be alleles selected in one location and not another, and alleles under balancing selection. Steve would know better than me how to try to incorporate selection into the model, but my guess is that it would be very tricky.
Finally, could I ask, Steve, how many allelic variants did you assume in the founding couple, and what proportions of alleles did you put in them at 25% and 50%? Or did you assume that all variants arose through mutation?