Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1)

Richard,

Just an observation as a fly on the wall, but your style of communication here may be more offputting than you intend. I think your questions are fair and I share your desire in seeing your exchange with Dennis be as productive as we all hope. But you also seem to be issuing quite a lot of demands on exactly how this exchange needs to happen. As far as I know, it’s customary to make your argument, allow your conversational partner to make theirs in rebuttal, and continue as such as the issues are worked through. You seem to be wanting to direct exactly how Dennis needs to proceed as well as pressuring upon him a sense of urgency that at best seems unnecessary and at worst a little churlish or rude. Maybe this is just my own take and it’s not shared by Dennis or others reading this thread, but for what it’s worth…

That said, I’m looking forward to your continued exchange.

1 Like

Asking for references and clarification is about as standard and civilised as it gets in the scientific literature - hardly churlish and it is odd to think it rude.

2 Likes

CJDS,

Simply asking for references or clarifications is not what I’m referring to.

I think it is easy to read unintended tone into other people’s words. As a moderator, I think both Dr. Venema and Dr. Buggs are modeling the graciousness we aspire to here. One person’s clarity and forthrightness is another person’s demands. Let’s not derail the conversation by expecting anyone to defend how their “tone” should have been read.

5 Likes

Hi Dennis,

Thank you for such a quick response to my query, and thank you for the citation to Li and Durbin and to the 1000 genomes project.

I am saying this is my understanding of the published literature and the relevant publically-available databases.
I had assumed that was the case as this is what one expects from a scientist. This is of course, why, as a fellow scientist, I am asking you - I hope courteously and professionally - to point me to the exact papers in the published literature, and to actual analyses of the public databases that support the claims you are making in Adam and the Genome.
Li and Durban would be one paper relevant here
Thank you. As you know, this is the paper that presents the PSMC method. In my email to you and my blog I have explained why I do not think that the PSMC method is able to detect a short sharp population bottleneck. I assume that you are going to respond to my comments on PSMC in Part II of your response, so I will not press you further on this issue now.
moreover the 1,000 genomes consortium papers, papers that estimate the present-day human mutation rate, and so on. For example A global reference for human genetic variation
I can see how the 1,000 genomes project can provide the raw data for an analysis such as the one I am asking you for clarification on - the one that you mention in the passage from your book that I quoted in my previous post (above).

However, as far as I can see, the 1,000 genomes paper does not do the calculations that you report in that passage. Unless I am missing something, the authors do not report a calculation of ancestral population sizes from the number of alleles found in present day populations. They do present several PSMC analyses (which are based on runs of heterozygosity within genomes) but they do not seem to present the calculation that you mention in the passage I quoted from Adam and the Genome. Is there another paper in which they conduct the calculations that you are telling your readers about? As I say, I am very keen to know what genes were used in these calculations and how they generated an ancestral population size of 10,000.

3 Likes

What I’m talking about there is a summary of the field as a whole - and the PSMC analyses in the 1,000 genomes paper is certainly one of the relevant experiments. So are LD studies. So are the Li and Durban PSMC results. All are based on examining present-day alleles in present-day populations (of course - it couldn’t be otherwise unless we’re talking paleogenomics). Some of these analyses use a forward mutation rate (which is also the rate of fixation for neutral alleles, and most variation is neutral). I know you think PMSC studies could miss a bottleneck to two. I disagree, but you’re going to have to wait until I have time to write it up and explain why I think you’re mistaken. Part II might include PSMC, or it might not. Part of my goal here is to not just explain my reasoning to you as a biologist, but to make it accessible to a non-specialist audience. That takes more time.

1 Like

(For now I just want to comment on this – I should have more to add when I’ve run some simulations (which may take a few days).)

The Keinan and Clark paper is not relevant to the question at hand. The new mutations they describe are indeed rare: 80% of them have frequency < 0.05%. There is no question that large numbers of very rare variants can accumulate in a large, young population. It is the alleles at moderate frequency – roughly 5% to 20% minor allele frequency – that are difficult to explain with a recent bottleneck. As the authors point out in that paper, 92% of neutral alleles at a frequency of 5% are expected to be older than 10,000 years; that is not the historical period they discuss.

4 Likes

Looking forward to what you have to contribute, Steve.

Hi Dennis,

I am very happy to wait for your comments on the PSMC method and why you believe that it would detect a sudden sharp bottleneck of two. Please don’t feel under any pressure; I appreciate your attempts to make all this accessible to non-specialist audiences. That is not always an easy task.

Regarding the passage from chapter 3 of Adam and the Genome that I am asking you for citations to support. You have responded in your comment above:

What I'm talking about there is a summary of the field as a whole - and the PSMC analyses in the 1,000 genomes paper is certainly one of the relevant experiments. So are LD studies. So are the Li and Durban PSMC results.
I am sorry, I am struggling to follow you here. I'm afraid I can't see how that passage is a summary of the field as a whole, and therefore I don't understand how citations of the PSMC and LD studies support it.

Here is the passage that we are discussing in its context in Adam and the Genome: I have placed it in italics, and also added some emphases in bold.

...given the importance of this question for many Christians— and the strong insistence of many apologists that the science is completely wrong— it is worth at least sketching out a few of the methods geneticists use that support the conclusion that we descend from a population that has never dipped below about 10,000 individuals. While the story of the beleaguered Tasmanian devil provides a nice way to “see” the sort of thing we would expect if in fact the human race began with just two individuals, scientists have many other methods at their disposal to measure just how large our population has been over time. One simple way is to select a few genes and measure how many alleles of that gene are present in present-day humans. Now that the Human Genome Project has been completed and we have sequenced the DNA of thousands of humans, this sort of study can be done simply using a computer. Taking into account the human mutation rate, and the mathematical probability of new mutations spreading in a population or being lost, these methods indicate an ancestral population size for humans right around that 10,000 figure. In fact, to generate the number of alleles we see in the present day from a starting point of just two individuals, one would have to postulate mutation rates far in excess of what we observe for any animal. Ah, you might say, these studies require an estimate of mutation frequencies from the distant past. What if the mutation frequency once was much higher than it is now? Couldn’t that explain the data we see now and still preserve an original founding couple? Aside from the problems this sort of mutation rate would present to any species, we have other ways of measuring ancestral population sizes that do not depend on mutation frequency. These methods thus provide an independent way to check our results using allele diversity alone. Let’s tackle one of these methods next: estimating ancestral population sizes using something known as “linkage disequilibrium.”
Then, after describing the LD study you write:
The results indicate that we come from an ancestral population of about 10,000 individuals— the same result we obtained when using allele diversity alone.
A little later you write
A more recent and sophisticated model that uses a similar approach but also incorporates mutation frequency has recently been published. This paper was significant because the model allows for determining ancestral population sizes over time using the genome of only one individual. [You then describe the PSMC method.]

I am therefore struggling to understand how the passage we are discussing - the one in italics above - could be a “summary of the field as a whole” including linkage disequilibrium and PSMC methods. It seems to just be about the allele frequency method. You clearly distinguish the allele frequency method from the other methods. You say that the linkage disequilibrium method is “an independent way to check our results using allele diversity alone.” You say it gives “the same result we obtained when using allele diversity alone”. You describe the PSMC methods as “A more recent and sophisticated model”.

I am sorry that I am spending so long on this point - this really is not where I had expected our discussion to go. I thought I was making a very straightforward request when I asked for a citation for the calculations in this passage. I am still hoping that you may be able to, now I have reminded you of the context of the passage. I appreciate that it may be a while since you re-read the chapter for yourself, and your recollection of what you wrote could be different from the text of the book. I know that I am sometimes surprised when I re-read something that I wrote myself after several months away from it.

Hi Richard,

Allele-based methods: 1000 genomes (including their PSMC), and understanding allele frequency distribution and mutation frequency/fixation
LD: independent of mutation frequency
"recent and sophisticated" = PSMC on single individuals (a specific case of allele methods that is somewhat distinct from the prior PSMC work)

So you’re right - it’s a summary of allele methods, including PSMC, interspersed with the discussion on LD, and then back to a special case of an allele method with the use of PSMC on single genomes. That summary doesn’t include LD. I haven’t read over that section in some time. Hopefully that clears it up.

Another thing to keep in mind is that the vast majority of scientists are not at all interested in (or likely aware of) what evangelical Christians want to “see” from their data. It wouldn’t even cross the mind of a group to publish a paper that specifically tackles the question of all humans descending uniquely from just two people. This wouldn’t even be on their radar because none of the evidence we have accumulated in the last 30+ years even remotely suggests it.

So, you’re not going to see that specifically addressed in the literature. What it takes is people who are tuned to those questions who can interpret the literature in light of those issues.

4 Likes

Hi Dennis

So you're right - it's a summary of allele methods, including PSMC, interspersed with the discussion on LD, and then back to a special case of an allele method with the use of PSMC on single genomes.
I'm sorry, but that wasn't my reading of the passage. As I say, it seems to me that the passage in italics is about a method based on allele counts, (explicitly not including PSMC). It seems to be describing the kind of study you mention in your "Part I" blog:
So, a bottleneck to two individuals would leave an enduring mark on our genomes – and one part of that mark would be a severe reduction in the number of alleles we have - down to a maximum of four alleles at any given gene. Humans, however, have a large number of alleles for many genes – famously, there are hundreds of alleles for some genes involved in immune system function. These alleles take time to generate, because the mutation rate in humans is very low. This high allele diversity is thus the first indication that we did not pass through a severe population bottleneck, but rather a relatively mild one (estimated, as we have discussed, at about 10,000 individuals by current methods).
Clearly you have a study in mind that supports the passage in italics and also this paragraph from your blog. All I am requesting is that you share the reference with me. Sorry if I am starting to sound like a broken record!

Dennis, this is exactly my point! :slight_smile:This is what my Nature Ecology and Evolution community blog is saying.

I agree it’s not on the radar, but I think we are getting ahead of ourselves if we say that none of the evidence even remotely suggests it, given that the hypothesis has not been directly tested.

This is exactly my concern with your book chapter. I think you are seeing things in the studies that are not there, as they never set out to test the bottleneck hypothesis.

So the question is: given that the scientific literature does not specifically address the question of whether or not humans have passed through a bottleneck of two, what further analyses are needed to address this question? This will take more work than just interpretation of the existing literature.

I am really glad that we seem to be finding some common ground.

1 Like

I disagree here. Even if the authors themselves do not specifically address it, the data certainly do.

This also crops up in other areas - you will not find a paper where the authors specifically address the idea that the earth is 6,000 years old, for example. Why not? Because the evidence we have doesn’t even come close to 6KYA. The data absolutely are relevant to the question.

Or to put it another way, I don’t think we need more work - I think the literature is clear. I suppose what would be most convincing to you would be to have the 1000 genomes group, or Li and Durban, etc, run a simulation to see what their PSMC results would look like on an artificial dataset that is instantaneously reduced to 2 people. I think you’d see a result that gets down at least close to Ne=2 (or 20, or 200) even if it spread that result over a longer timescale, like we see in their papers. What you’re arguing is that ~1500 and 2 are indistinguishable by their methods. I disagree. More anon.

That passage is a summary statement about allele-based methods. Why would I exclude the 1000 genomes papers (including their PSMC results)? I was primarily thinking about the 1000 genomes work when writing that section.

Hi Dennis,[quote=“DennisVenema, post:29, topic:37039”]
Even if the authors themselves do not specifically address it, the data certainly do.
[/quote]

I agree that the genomic data presented in the existing literature are relevant, and sufficient, for an analysis to address the short sharp bottleneck hypothesis. But if the authors have not done an appropriate analysis, someone else needs to. As far as I can see this has not been done. This is what I am saying in my blog.

In my blog I refer to a website that reports such a simulation, which found that PSMC could not detect sharp sudden bottlenecks. I also sketch out reasons why this is to be expected. I look forward to discussing this with you in more detail.

I am sorry Dennis, but I am not persuaded that this passage in your book is a summary statement that includes the PSMC method. With all due respects to you as author, a plain reading of your chapter, as I have spelt out in detail above, is that this passage refers to an allele counting method that you then later compare the LD and PSMC approaches with. You make a point in your chapter that allele counts, LD and PSMC independently give close to the same result - a population size of 10,000 individuals.

Furthermore, in your Part 1 response blog (which we are discussing here) you make a big point that heterozygosity is little affected by bottlenecks but allele counts are. You go to great length to explain why allele counts are a good way of detecting bottlenecks. You repeat the claim that the allele counting method indicates that human population sizes have never dropped below 10,000.

But now you seem to be saying to me that allele counting methods are not actually specifically included in your chapter: that the passage about the allele counting method is actually a summary about all methods that use alleles in some way, including PSMC (which does not count alleles, and does not “select a few genes”). Despite my repeated requests, you have not given me any reference or citation, or a description of an analysis that you or someone else has done, where human effective population sizes have been estimated by an allele counting method.

Instead, you are pointing me to the 1000 genomes paper. This is a wonderful paper that I have often referred my students to, and I do not doubt for a moment that the 1000 genomes project provides the raw data necessary for an analysis based on allele counts, but as far as I can see, the authors have not done such an analysis.

If you are not able to give me a citation that includes use of an allele counting method, why did you spend such a large proportion of your Part I blog explaining why the allele counting method is such a good way of detecting bottlenecks? Why do you mention allele counting methods in your book?

I have to admit, I am bemused by this. I think that the allele counting method is one of the best methods available for detecting bottlenecks, and I think it is the biggest challenge to the bottleneck of two hypothesis. I think there is a really interesting discussion to be had here. It has come as a genuine surprise to me that you are not pointing me to a calculation, or a paper, or a textbook, or something else that clearly explains the derivation of a 10,000 effective population size figure.

We seem to have reached an impasse on this point. I will have to let others read through your book chapter and your blog above, and reach their own conclusions.

Richard,

Perhaps you could wait for Dennis to post the next parts of his blog response, as he’s committed to do, before declaring an impasse.

2 Likes

@RichardBuggs,

Surely @DennisVenema is not the only person who can do genome mathematics.

What test or study results can you offer that would indicate an answer closer to 2 than to 10,000? Certainly on a scale of difference that large, it should be relatively easy to offer some general results from your side of the divide.

Hi Tim, I’m not saying we are at an impasse on this whole issue - just the point about what Dennis was saying in that particular, but very important passage of his book.

I would invite you to step in and help us. As far as I can see you are not a biologist, so you can help adjudicate between us about what is the plain meaning of the passage to readers. Perhaps @TedDavis could also pitch in to, as he finds Dennis’s writing to have great clarity, as he has mentioned above. As a historian, he must be used to looking closely at the meaning of texts. I would also welcome the view of @glipsnort as a geneticist, and perhaps @Christy could step in as moderator. I would also welcome other readers to pitch in and give their opinion.

The questions I ask you are, when you read the extract from Adam and the Genome in bold below, which I show in its context:

  • Does the passage make you think that it is referring to a scientific study where a few genes have been selected and the number of alleles of those genes in current day human populations have been measured?

  • Does the passage make you think that someone has done calculations on these genes on a computer that have indicated that the ancestral population size for humans is around 10,000?

  • Does the passage make you think that this is a different method to the PSMC method?

Here is the passage that we are discussing in its context in Adam and the Genome:

...given the importance of this question for many Christians— and the strong insistence of many apologists that the science is completely wrong— it is worth at least sketching out a few of the methods geneticists use that support the conclusion that we descend from a population that has never dipped below about 10,000 individuals. While the story of the beleaguered Tasmanian devil provides a nice way to “see” the sort of thing we would expect if in fact the human race began with just two individuals, scientists have many other methods at their disposal to measure just how large our population has been over time. One simple way is to select a few genes and measure how many alleles of that gene are present in present-day humans. Now that the Human Genome Project has been completed and we have sequenced the DNA of thousands of humans, this sort of study can be done simply using a computer. Taking into account the human mutation rate, and the mathematical probability of new mutations spreading in a population or being lost, these methods indicate an ancestral population size for humans right around that 10,000 figure. In fact, to generate the number of alleles we see in the present day from a starting point of just two individuals, one would have to postulate mutation rates far in excess of what we observe for any animal. Ah, you might say, these studies require an estimate of mutation frequencies from the distant past. What if the mutation frequency once was much higher than it is now? Couldn’t that explain the data we see now and still preserve an original founding couple? Aside from the problems this sort of mutation rate would present to any species, we have other ways of measuring ancestral population sizes that do not depend on mutation frequency. These methods thus provide an independent way to check our results using allele diversity alone. Let’s tackle one of these methods next: estimating ancestral population sizes using something known as “linkage disequilibrium.” [Then, the text describes the LD study and continues]...The results indicate that we come from an ancestral population of about 10,000 individuals— the same result we obtained when using allele diversity alone... [Then a little later the chapter continues] A more recent and sophisticated model that uses a similar approach but also incorporates mutation frequency has recently been published. This paper was significant because the model allows for determining ancestral population sizes over time using the genome of only one individual. [It then describes the PSMC method, saying of it]... Instead of looking at a given pair of loci in many individuals, this method looks at many pairs of loci within one individual....this is in good agreement with previous, less powerful methods,
I look forward to your and other readers' answers to my questions.
1 Like