Okay, got a chance to read it. I’m not sure your critique is correct. Can you help me understand? You point out two mistakes in this claim by @DennisVenema:
his means that about 25% of the time, heterozygosity is lost, and that only one allele remains in the population for a given gene. If only one allele is present, then this is a coalescence point for that gene: going forward, we will have to wait for mutations to produce new alleles, and those new alleles will coalesce back to their single ancestral allele that survived the bottleneck. In the future, as new alleles are produced from the surviving allele through mutation, the new alleles will all coalesce within a few generations of the bottleneck. Their TMRCA values will thus be almost identical… Coalescent-based methods are thus an excellent way to detect bottlenecks—even really brief ones, if they are severe enough. Even a brief, severe bottleneck will still greatly increase the chances of alleles being lost, and the telltale signature of numerous genes that coalesce within a short time frame.
To clarify here, Ne (or effective population size) is just another unit conversion. It is the reciprocal of the coalescence rate as a function of time. Coalescence are the points in the tree where a merger happens, and they are normalized appropriately by the Kingman term. We just look at when they are, binning by time. This is the coalescence rate. One divided by that is Ne. That is why the TMRCA is relevant (of a subset of the alleles), as that is how the date is determined.
You say his wrong in this claim. I agree, but for a different reason than you. You write (dealing with each point differently)…
I think that Dr Venema is wrong in making this claim. Let me explain why. I think that he is making at least two mistakes here.
(1) In calculations that show that 75% of heterozygosity would be maintained after a bottleneck, the level of heterozygosity before the bottleneck is “known”. But coalescent models run backwards in time, and we can only “see” those lineages that survive the bottleneck. Thus we cannot directly know how many alleles were lost via sampling at the bottleneck. The loss of alleles via the sampling effect of the bottleneck will not show up as coalescence events in a coalescence model. These are two separate effects of a bottleneck.
First off there is a large conceptual difference between coalescence and heterozygosity. A very high amount of coalescence can take place, even as 75% heterozygosity is maintained. These are just different things.
Moreover, the loss of alleles can show up as coalescence events. However, and this the critical point, our ability to detect them is very tightly dependent on the number of lineages entering (in backward time) the bottleneck. If there is only one surviving lineage, there will be zero coalescents, our ability to detect is zero. If there are 50 lineages, there will be a very high amount of coalescence (at least 46 lineages will coalesce), and we will almost certainly detect it. If there is 4 lineages, it seems that we would not detect it. To close the loop, the fact that there might by 75% heterozygosity after a bottleneck tells us nothing about how many lineages are coalescing at this point in time. These are different things, and are entirely separable.
Moreover, and this is a critical point. Coalescence analysis CAN detect bottlenecks, but only if there is sufficient surviving lineages to cause a spike in coalescence at the bottleneck. So @DennisVenema appears to be in error, in that he did not understand the or explain the lineage number dependence on coalescences when he wrote this. However, he is correct in his claim that coalescence can detect bottlenecks, if we limit ourselves to very recent timepoints. However, in the distant past, not so much.
(2) Dennis is assuming that if only one allele is present in a population, then that allele has coalesced. This is a misunderstanding of coalescent theory. In coalescent theory, two gene lineages only coalesce when they reach a single copy in a single genome within a population. This means that if only one allele is present at a particular locus in a bottleneck of two, we know for sure that this allele has NOT coalesced, as it is present in four genomes (two in each person). It must therefore coalesce before the bottleneck. If the ancestral population is large, that coalescence will be a long time before the bottleneck.
I do not think this is his assumption. Coalescence does NOT make any statements about the number of alleles at a given time in history. Rather it make a statement about the number of lineages that survive to this day by direct descent. That is all that is modeled in coalescence theory. It does not presume that all alleles collapse to a single allele at coalescence, just that two alleles (potentially of many) collapse to one.
So once again, a lot of coalescence can take place, even when there is heterozygosity. Let’s look at your figures to make that clear.
So, here, we see that the conditions put forward are not met. There is more than one lineage going through the bottleneck. The heterozygosity is high.
What about coalescence? Well, in that figure, there are THREE coalescence events between g0 and g1, all in the blue lineages. I cannot tell for sure (it gives me a headache to look to closely) but that seems to be the maximum number of coalescence in any generation. If we had more lineages going in there, we would have seen more coalescence. In fact we are guaranteed that there at least L - 4 coalescents through a single couple bottleneck if there are L lineages going into it (reverse time).
In this case, there was SIX lineages entering, and THREE coalesced. Notice how this nothing to do with heterozygosity. It is, rather, about reduction in number of alleles. It is, also, tightly influenced by the NUMBER of surviving alleles that are still in the population right after (time forward) the bottleneck.
It seems, however, @RichardBuggs you are confusing the several coalescence that appear with “THE coalescent,” which you identify in this figure:
There is not one coalescent at g0, but FOUR coalescents. This is a critical point. Really all merge points in a tree are coalescents, not just the top one.
We can calculate the chances, because normally we would expect there to be a 0.5 chance of a parent passing a particular gene copy to their children. So the chances of what we see in generation g1 above are: 0.5 x 0.5 x 0.5 x 0.5 = 0.54 = 0.0625. The chances of what we see in generation g2, given what we see in generation g1, are 0.516 = 0.0000153. The overall probability of this is 0.520= 0.000000954.
As we have four starting lineages at the bottleneck, we need to multiply by four, to find the overall chance of having coalescence to one lineage at the bottleneck. This gives us 0.00000381. So all in all, we expect coalescence to a single lineage 0.000381% of the time. Not 25% of the time.
This turns out not to be quite correct. Coalescence analysis does not deal with this. Privately, I had shared some similar computations, but also came to understand it was in error (notice it is not in public). However, this computations miss exactly what coalescence theory is doing, I missed this too the first time around, It is quite subtle.
Some subtle math errors notwithstanding, what is being computed here might be a reasonable estimate of the allele distribution we would expect after a bottleneck if we were to measure it. However, most of that diversity is going to die out (or be missed) before we get to our specific samples. So it is not really valid. What coalescence tries to do is, instead, reconstruct the history of all direct ancestral sequences of the data in our current day sample. There may be other alleles with the exact same DNA sequences alongside this direct ancestors, but coalescence only models the direct ancestors.
Keep in mind, the number of allelic lineages at different points in time, does not tell us the number of alleles in the population at that point in time. For example, lets say we are 3 mya, where the vast majority of the genome has coalesced to a single allele that survives till today. The population at that time, however, is not all homozygous for that allele. That allele, also, might even be low frequency. Rather, we are just saying none of those other alleles survive for 3 million years to present day.
For that reason, demonstrating heterozygosity is not lost does not really make the case here. Moreover, remember, no more than 4 alleles can pass through the bottleneck. Heterozygosity, however, does not tell us how many alleles there are.
If my calculations are correct (and I stand ready to be corrected if they are not) then Dennis is quite wrong to think that 25% of genes would coalesce to one lineage at a bottleneck of two. Less than 1% would.
Neither your calculations nor @DennisVenema are correct. It turns out that the amount of coalescence is entirely dependent on the number of extant lineages that enter (backwards time) the bottleneck. The more lineages the more coalescence, the fewer lineages the fewer coalescence. Also his application of the Kingman coalescent to compute 25% is just incorrect. It is wrong.
One final point about a major conceptual error. Read this statement by @DennisVenema:
Coalescent-based methods are thus an excellent way to detect bottlenecks—even really brief ones, if they are severe enough. Even a brief, severe bottleneck will still greatly increase the chances of alleles being lost, and the telltale signature of numerous genes that coalesce within a short time frame.
This is false. For all the reasons we discussed, but for one additional reason. “Severity” of a bottleneck includes two things: (1) the size of the bottleneck population AND (2) the number of generations in the bottleneck. Severe bottlenecks are a LARGE number of generations, with a SMALL population size. However, a SINGLE generation of a very SMALL population size (e.g. a single couple) is not necessarily a severe bottleneck. That is, remember, because severity is defined along two dimensions. In one dimension it is severe, but in another it is extremely mild.
Of note, I’ve had a chance to interact with some secular population geneticists about this. There is actually quite a bit in the literature that makes this point. It is common for papers here to note the limitations of this approach, that it cannot pick up brief bottlenecks.
That is, in fact, what makes this question so scientifically interesting. Essentially, we are asking if a bottleneck is extremely severe by one dimension, but extremely mild by another, is detectable? No one has tested that before (though I just did!, data not shown), and we are finding out that the answer is “no we cannot detect it much before 500 kya.” It’s no surprise, because this falls out nicely from the math, justifying the use of TMR4A here.
In summary, @RichardBuggs I agree that @DennisVenema was in error, however, I’m not sure your argument is correct either. Can you clarify if I missed something here? I hope I did not misrepresent you. My critique here is based on my best understanding of what you wrote. However, please correct me if I missed something,