Here, we are discussing a bottleneck of 2 for a single generation, followed by rapid expansion. In this case, we are not considering miracles, and we entirely expect this couple to be a heterozygous, and not clones of one another. We are most curious about a “bottleneck” earlier than 300 kya and as far back as 2 mya.
Good point too @Christy. I do not think we are going to find positive evidence for a bottleneck. However, there might be enough ambiguity in the evidence we cannot rule it out in the deep enough past. This is really a question about what the evidence does and does not tell us, and the strength with which it speaks.
I would also add that there is a categorical difference between this question and the regular arguments from Intelligent Design. Here, there is no indirect invocation of divine action (we’ve ruled out special creation), and @aguager (and others) engaging evidence in a manner largely consistent with what we see with mainstream scientists. At times the rhetoric goes places I think ultimately undercuts their case (e.g. when that article connects this effort to ID), but the actual inquiry is recognizably scientific. We are using the rules of mainstream science, asking a valid question of the data.
It is for this reason that I feel this question needs to be taken seriously.
You have been clear on this.
This is very unfortunate @RichardBuggs. I’m very sorry to hear this. I request that you clarify either publicly or in a private message to me how you have been misrepresented. I do not want to accidently ascribe a view to you that is not yours.
I should also emphasize that when I address the ID movement, that does not necessarily include you. Though they have take some delight in your public effort, I’m not sure I’ve seen any public evidence that you are associated with them. My references to ID are not meant to connect you to them, unless you so wish to be connected to them.
In truth, I’ve learned a lot too. This has been an interesting and informative direction.
ArgWeaver Does Not Assume Large Population
It appears you are drawing upon an observation by Andrew Jones at the DI, who writes:
However, a little digging into how ARGweaver works reveals that it too assumes a constant population, and uses this assumption to assign probabilities to ancestry trees. Therefore, again, it is not clear if it is really appropriate for asking questions about Adam and Eve. The particular reason why it is a problem is a bit technical: coalescence (branching but backwards in time) happens much more slowly in a large population. In a large population, the last few coalescents could take thousands of generations. But what if you have a small number of generations, drawing to a smaller and smaller population and terminating in a single couple? All the lineages will coalesce (down to at most four as explained above) but at a faster rate.
This turns out, in my opinion, not to be the correct assessment. I’m going to do a more detailed post on this in the future, but can explain a little bit more now.
ArgWeaver is using a prior on trees, that is parameterized by population size (N = 10,000). The language of “assumes a large population size” is just correct. It is more accurate to say that is starts with a weak prior belief of a population size of 10,000. It is a weak prior belief, because it is designed to be quickly overcome by data. Let me give you two reasons why it does not impact the results I’ve put out on TMR4A. These will be expanded later on some posts that I’ll link here when done:
As a prior, this is not an assumption, but a starting belief that is meant to be overridden by the data. The only way that the ArgWeaver program uses the population size is in computing this prior. Population size is neither simulated nor modeled in the program except for placing this weak prior on population size. Remember, priors are not assumptions or constraints.
The ArgWeaver output files tell us the strength of the prior vs. the data, and it is just about 5%. That means the model output is dominated 95% by the data, and not by the prior (as it is designed).
The prior distribution for TMR4A is at about 200 kya (which I will show later), but we measured the TMR4A at about 420 kya. That means the data is pulling the estimate upwards from the prior, not downwards.
This last point should end any confusion. To draw analogy, it’s like we measured the weight of widgets, with the weak starting belief that the average weight of these widgets is 200 lb. After weighing several of them, and taking the prior into account, we compute the average weight is 420 lb. The fact we used a prior could be an argument that the real average is greater than 420 lb, but that is not a plausible argument that the true average is less than 420 lb. The prior, in our case is biasing the results downwards, not upwards.
With that in mind Dr. Jones was just mistaken when he writes:
The tool used, ARGweaver, is fantastic in that it combines an enormous amount of real genetic information to model the past genetic history of humans. For this reason it gives the impression of being truly objective, and so when I first read it, I thought he had proved that there could be no bottleneck earlier than 300,000 years…However, a little digging into how ARGweaver works reveals that it too assumes a constant population, and uses this assumption to assign probabilities to ancestry trees.
I would submit that, given what I have just explained, that this is not a reason to doubt the results that I put forward. I do believe this data shows there could be no bottleneck earlier than 300,000 years without either miracles or our ancestors have vastly different mutation rates than us. Both those possibilities, howver, are off the table right now.
There are three ways that could prove me wrong here:
- Do an experiment with simulated data, showing that the prior is strong enough to override detecting a bottleneck before 300 kya in the argweaver code. (not likely)
- Modify argweaver to no longer use the prior (which is fairly easy), and run it on the same dataset, demonstrating that the estimated TMR4A goes down, not up. (not likely)
- Find another way that population size is used by argweaver that I missed, and show it has a stronger effect that I imagine. (not likely)
Of all these #3 is most likely way to show me wrong here. Until that happens though, I think that 420 kya +/- 100 kya is a reasonable bound on when we think a couple bottleneck could have occured. Do you agree @RichardBuggs? I’m being fairly generous in how I set the confidence interval there too.
My Next Steps
My next steps, when I get around to it, are:
To test the ability of PSMC, MCMS and/or ArgWeaver to detect bottlenecks on simulated data. Have the simulation code working, and it’s really a matter of running the code. My instinct tells me this will increase the bound to about 500 kya, but I won’t know till I run it.
Recompute TMR4A while weighting coalescents by the segment length. Failure to do this before, I think, is the biggest source of error in the prior analysis. I think it might shift things around a small amount…
Using the argweaver data to estimate population size. If this works correctly, ti should increase our confidence that this is a good proxy for understanding the success and failure of PSMC and MCMS. Incidentally, MCMS uses a very similar model as ArgWeaver (but a different representation).
@DennisVenema and @glipsnort correct me if I’m wrong, but it seems that the LD data is really not worth getting into in detail, as PSMC, MCMS and Argweaver are (essentially) modeling the LD data with much higher accuracy than other approaches. The key thing is understand how these methods model the DNA, which by extension is the best way to understand all the LD data. Do you agree?
Do You Agree?
For the reasons outlined above, I’m not sure this is a valid critique. Though I do agree, this has been highly informative for all of us, including me. I had no idea what the data would show till I did this analysis.
In Argweaver, the size of tree is determined primarily by (1) mutation rate (2) allelic diversity, and (3) only to a small amount by the prior. There is no sensible way to “include the effect of population size decreasing.” I would endorse running the model again without a prior, but as I’ve shown there is no good reason to think that will reduce the TMR4A time. I think that should settle this concern. Right?
@RichardBuggs, you pushed @DennisVenema to concede your point on Zhoa 2002. That ended up being valuable, as it clarified some key strengths and weaknesses of the evidence. Respectfully, would you reciprocate? Do you acknowledge the ArgWeaver evidence seems to rule out a single couple bottleneck before 300 kya? Can you agree to that? If that is not something you agree with, please clarify why not. Of course, if you see a solid technical problem that I missed, that is all the more reason to clarify. Let’s get to the bottom of it.