Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1)

Hi Joshua,

I am glad we have reached such a level of agreement.

Regarding ARGweaver:

I wonder if the code itself is pointing in a slightly different direction to the paper. Their footnote under table 1 suggest that the code does allow for separate Ne estimates at each time interval, but for all the analyses in the paper itself they assumed that Ne did not vary among time intervals. Perhaps they did this to speed up the analysis as they had such a large dataset? I still have more reading to do of this paper.

I am not sure how much different this would make anyway, as (as we both agree) any method they used to estimate Ne would likely not detect a bottleneck anyway, if one had in fact occurred.

That is brilliant. Do keep me updated!

Interestingly, I believe that this has been the position of @agauger all along.

1 Like

Hi all,

I have been doing a bit more reading about the theoretical background of some of the methods we have been discussing here. I am not a mathematician, so much of this is outside of my area of expertise. However, I have come across three papers that suggest that seem to suggest that site frequency spectra (as presented earlier in this discussion) have severe limitations as a source of evidence about past population sizes. The second of these papers specifically examines scenarios of a bottleneck followed by exponential population growth.

Simon Myers, Charles Fefferman, Nick Patterson Can one learn history from the allelic spectrum? Theoretical Population Biology, Volume 73, Issue 3, 2008, pp. 342-348
https://www.sciencedirect.com/science/article/pii/S0040580908000038
Abstract: It is well known that the neutral allelic frequency spectrum of a population is affected by the history of population size. A number of authors have used this fact to infer history given observed allele frequency data. We ask whether perfect information concerning the spectrum allows precise recovery of the history, and with an explicit example show that the answer is in the negative. This implies some limitations on how informative allelic spectra can be.

Terhorst, Jonathan, and Yun S. Song. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences 112.25 (2015): 7677-7682.
http://www.pnas.org/content/112/25/7677.short
Abstract: The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic that is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, little is currently known about the information theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/log s), where s is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number s of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.

Baharian, Soheil, and Simon Gravel. “On the decidability of population size histories from finite allele frequency spectra.” Theoretical population biology (2018).
https://www.sciencedirect.com/science/article/pii/S004058091730148X
Abstract: Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild ‘reasonableness’ constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories, where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption."

So I think I should add these to the criticism I made earlier of this approach to @glipsnort here:

In addition, I came across this paper which @DennisVenema may find interesting as he writes his blog about the PSMC method

Kim, J., Mossel, E., Rácz, M. Z., & Ross, N. (2015). Can one hear the shape of a population history?. Theoretical population biology, 100, 26-38.
http://www.sciencedirect.com/science/article/pii/S0040580914000987?via%3Dihub

I have also been reading up more on ARGweaver and intend to post again on this soon @Swamidass .

1 Like

Hello @TedDavis, I hope you are well my friend. Things have come a long substantially since you first posted on this thread, back about 2 months ago. I summarized the scientific highlights of this conversation here.

Surprisingly, at lease to me, @RichardBuggs was on to something. Our certainty about a bottleneck in the distant past (e.g. before 500 kya) may not be as high as we imagined. As I write here…

And the implications for theology…

Now, @TedDavis, I agree with you that a recent genealogical Adam (A Genealogical Rapprochement on Adam?) is probably more significant in the long run that an ancient single-couple bottleneck. This, nonetheless, is a surprising finding. Assuming, of course, that it pans out. We are still early in the game, and might find a mistake. This reminds, many ways, of a similar point we were almost exactly 12 months ago on the genealogical Adam work.

Nonetheless, this really could pan out, and some Christians mich join @agauger in taking this view. At the very least, much of the claims on the science have been overstated if it takes this much effort to disprove an ancient bottleneck, and we have yet to do so.

I’m curious, therefore, your thoughts on a few levels as a historian many of us trust in this conversation:

  1. How do you think an ancient bottleneck couple will influence the conversation?
  2. How do you think a recent genealogical Adam will influence the conversation?
  3. If TE / EC’s have overstated or been overconfident on the evidence, how should this correction rework our voice?
  4. Do you know any good historical analogies to these two corrections, if they end up being correct.
  5. I am planning for the ASA Workshop in June in Boston on “Reworking the Science of Adam.” What do you think are the key things for the ASA community to know about these exchanges?

Thanks for your thoughtfulness here. I’m wondering how your perspective could guide us here. Many of us are doing what we can to serve the Church, and the science of Adam appears to be a place where the ball was fumbled.

Allele frequency spectrums (AFS) do not give a solid view of ancient bottlenecks, but they do of recent population structure. Ironically, very recent bottlenecks are not well ascertained by MSMC and PSMC and LD-Blocks, but they are clear in AFS. This is covered pretty well here:

So yes, in the ancient past you cannot really infer much from AFS, but that has never been @glipsnort’s claim. His claims are consistent with what I showed with argweaver.

  1. @glipsnort has not made any claims of heliocentric certainty.

  2. He would agree that past about 500 kya, we do not expect allele frequency spectrums to detect a bottleneck of a single couple. That is where he places a tentative cutoff. So his results are essentially the same as argweaver, though the evidence form argweaver is much stronger.

  3. His original reason for delving into AFS was to respond to some young earth creationists that claimed the AFS was inconsistent with a large ancient population and required a single couple origin just 6,000 years ago: (Can someone explain like I'm 5 yo, what's wrong with this refutation of Biologos?).

  4. His response to Ola Hossjer (colleague of @agauger) has been very well measured, and entirely correct. (Glipsnort responds to a critical article) Notice that he does not prese a case against ancient bottlenecks, but only for common ancestry with great apes and huamns, and against a recent bottleneck. Both those claims are very well supported by the evidence, and he produces analysis of his own all the time.

I know you are not attacking @glipsnort personally, or even leveling an unfair scientific critique. I do think, however, it is important to clarify that he has been a measured and careful voice. In my opinion, he has not drawn incorrect conclusions from the AFS work, nor has he overstated his certainty of those results.

2 Likes

A couple technical updates:

ArgWeaver Does Not Assume Large Population. The computed TMR4A is biased downwards, not upwards, by the prior.

The Correct Mutation Rate. ArgWeager is using an experimentally confirmed mutation rate.

And, more importantly, this improvement of the estimate…

Correctly Weighting Coalescents. An improve esitmate of TMRCA is about 500 kya.

I finally got around to correcting this part of the code, and recomputing the TMR4A. Here is what we arrive at, a TMR4A of 495 kya, nearly 500 kya. This is a better estimate.

https://discourse-cdn-sjc2.com/standard9/uploads/peacefulscience/original/1X/94c9420257f170b3e5f847aff3363ba3451568a2.png

1 Like

An actual H. erectus (or heidelbergensis) named “Adam” might have been capable of naming “Eve” and the animals, but not much more. Of that much, we are certain …

1 Like

Hi Joshua,

I’m just catching up with this dialogue on a train. I should be marking essays, but will just take a moment to quickly repond to a couple of points.

Thanks, I had not seen that exchange before between Ola Hossjer and @glipsnort. Very interesting. However, it does pre-date the current discussion, and I am keen to hear Steve’s own response to the papers I have referenced on the AFS method. I agree with you that he has been a measured and careful voice in this discussion and I have great respect for his expertise.

But would you agree than in their analyses reported in the paper they have assumed a constant effective population size? If not, how do you understand the footnote to the table that I referenced above.

My train has just arrived at King’s Cross Station - sorry to have sign off. I greatly appreciate your work on this thread, and the honesty and open-mindedness that you have shown.

1 Like

@Swamidass:

Once you go back beyond 6,000 years, and especially 10,000 years, what’s the point of trying to prove a bottleneck “older than 10,000 years, and hidden in a shadow”?

If it creates a motivation for YEC’s to preserve their position in an Old Earth Scenario… good… .let them work for that.

Our job has been to show that the “Young Earth” part of any Christian’s world view is untenable. The more YEC’s work to legitimize an Old Earth Scenario, the better it will be for everyone!

1 Like

I hope to get back to this thread within a few days.

2 Likes

Hi Joshua @Swamidass
I am taking a look at the ARGweaver paper more throughly. It is very clear that the ratio of mutation rate to recombination rate is critical to the accuracy of the method, as the authors comment in the paper, and as several of their supplementary figures (S4-S8) show. When the mutation rate is high relative to the recombination rate, they have much more power than when it is low. However, I am struggling to see what recombination rate they used or estimated when analysing the 54 human genome sequences. Do you know what recombination rate was used? I notice that on page 8 they comment that ARGweaver has “a slight tendency to underestimate the number of recombinations, particularly at low values of mu/rho” and also that they say that other sources give a low value of mu/rho for human populations. This suggests that in their analysis of the 54 human genomes they may well have estimated a lower rate of recombination than the correct rate. However, I can’t find the figure. Is this something that you have looked at, please? If they have underestimated the recombination rate, how do you think that would affect the TMR4A?
best wishes
Richard

2 Likes

Steve, that’s great news. I would also be really glad to hear your view on Joshua’s analyses of the ARGWeaver data, if you have time.

1 Like

@RichardBuggs please exuse the delay in responding to you. I’d normally put a high priority on it, but my father unexpected passed away this last Saturday. I will return with haste, but have more pressing matters at the moment. Peace.

@Swamidass,

My deepest sadness to hear this news. Prayers for you and your family! George Brooks

3 Likes

Joshua, I am so sorry to hear this. You and your family are in my thoughts and prayers.

1 Like

Josh, so sorry to hear this. I will be praying for you and your family.

1 Like

Just to come back to points raised by @GJDS and @Jon_Garvey that I did not get a chance to respond to earlier:

I am not sure if this is relevant to your question, and you probably are well aware of this already, but just in case it is useful to the discussion, here are some comments.

There is quite a large literature modelling the population genetic effects of severe bottlenecks on genetic diversity in populations, by, amongst others, Alan Templeton, Brian Charlesworth, Nick Barton and Masatoshi Nei. This was partly motivated by a debate about whether or not founder event bottlenecks can cause speciation (note, the debate was not about whether or not severe bottlenecks can happen - it was about whether they drive evolutionary change). This led to quite a lot of empirical studies on natural populations that were known to have passed through bottlenecks (evidenced by past human observation and records) and on experimental populations. For example, here is a recent paper that experimentally shows that populations do much better after a bottleneck if the founding couple are outbred rather than inbred previous to the bottleneck: Szűcs, M., Melbourne, B. A., Tuff, T., Weiss‐Lehman, C., & Hufbauer, R. A. (2017). Genetic and demographic founder effects have long‐term fitness consequences for colonising populations. Ecology letters, 20(4), 436-444.

I think it is fair to say that models of the effects of bottlenecks on genetic diversity are well developed and well tested. Of course, there are inherent limits to how well we can test the long term effects of bottlenecks in natural populations or experiments, as we are limited in the number of generations that we can study. I guess this is the major problem that you were both pointing out.

Perhaps the best empirical study available to us on the effects of bottlenecks is the Lenski long-term evolution experiment. Though this has the disadvantage of being on an asexual organism, it has the advantage of having run for 60000 generations. This experiment started with an extreme bottleneck, as each of the 12 parallel populations came from the same bacterial colony. Lenski et al (1991) wrote: “over all the founding populations, there was essentially no genetic variation either within or between populations, excepting only the neutral marker.”

Recently a fantastic study was done by Lenski and his collaborators tracking the genetic changes that have occurred in each of the 12 populations that all originated at the same time with the same bottleneck.
https://www.nature.com/articles/nature24287
The results are quite startling, in that very different dynamics have occured in each population. Here are the allele frequency trajectories for just three of the populations, from Figure 1 of the paper:


The authors found that the different dynamics were for several reasons, including: changes in mutation rates, periodic selection, and negative frequency dependent selection. The final paragraph of the paper reads:

“Together, our results demonstrate that long-term adaptation to a fixed environment can be characterized by a rich and dynamic set of population genetic processes, in stark contrast to the evolutionary desert expected near a fitness optimum. Rather than relying only on standard models of neutral mutation accumulation and mutation–selection balance in well-adapted populations, these more complex dynamical processes should also be considered and included more broadly when interpreting natural genetic variation.”

I think this perhaps supports the point you were making. It is a very very different system to human populations, but in many ways it should be a simpler system, and therefore easier to model. It underlines the difficulty of going from models to real evolution.

If we were presented with the twelve different Lenski LTEE populations that exist today and asked to reconstruct their past, I very much doubt we would be able to detect the fact that they all went through the same bottleneck 60000 generations ago.

4 Likes

@RichardBuggs Thanks for the reply, Richard.

That’s a truly astonishing graphic, given the tight constraints in the Lenski experiment.

@RichardBuggs,

Those are impressive numbers! And now we actually have a baseline for more fulsome future discussions when someone inevitably asks “Have we tried to demonstrate evolution in a laboratory.”

But there are those amongst us who are interested in how this labor demonstration applies to a 6,000 year time frame.

So I thought I would take the scale of the three sample results, and “zoom in” as required.

Taking the first 5000 generations as my starting point (and to provide context), I then made an approximate division of the 5000 generations in two, indicating where 2,500 generations would end.

I then divided 2,500 in half, to show where 1,250 generations would end. This was followed by another division in half, showing the end of 625 generations.

If we use the aggressive number of 20 years to a generation, 6,000 years would translate into about 300 generations. So rather than insert yet another confusing red line, I placed a bold red dot “in the middle” of the Zero-to-625 generations area of each chart.

I wonder if anyone would care to comment what these three samples can tell us about a proxy for 6,000 years, or 300 generations, as the time scale of the genetic experiment?

Readers, be sure to click on the image to see it at it’s largest magnification!

Thanks Richard; you have provided a great deal of information and it will take me some time to digest it.

I will respond in a general way at this time (note I am not questioning any technical aspect, or making any criticism of the modelling approache (s)). My interest is in “imagining” how a population of species that appear to be dispersed in a large area would somehow come together to form a relatively stable population, and then from there undergo further modification to form a bottleneck that may indicate a shrinking number. (at least that is how I envisage the modelling - a population that causes a mixing leading to genetic diversity) and followed by a bottleneck that leads to new genetically relevant species. I wish I can make the comment clear, but I cannot.

Is the proposed bottleneck (whatever its size) a result of hunters forming communities of thousands, to be followed by some type of shrinking? Is a bottleneck a devise required by models of one sort or another? Or am I asking the wrong questions?