Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1)

RichardBuggs · January 20, 2018, 10:14pm

Hi all,

I have been doing a bit more reading about the theoretical background of some of the methods we have been discussing here. I am not a mathematician, so much of this is outside of my area of expertise. However, I have come across three papers that suggest that seem to suggest that site frequency spectra (as presented earlier in this discussion) have severe limitations as a source of evidence about past population sizes. The second of these papers specifically examines scenarios of a bottleneck followed by exponential population growth.

Simon Myers, Charles Fefferman, Nick Patterson Can one learn history from the allelic spectrum? Theoretical Population Biology, Volume 73, Issue 3, 2008, pp. 342-348
https://www.sciencedirect.com/science/article/pii/S0040580908000038
Abstract: It is well known that the neutral allelic frequency spectrum of a population is affected by the history of population size. A number of authors have used this fact to infer history given observed allele frequency data. We ask whether perfect information concerning the spectrum allows precise recovery of the history, and with an explicit example show that the answer is in the negative. This implies some limitations on how informative allelic spectra can be.

Terhorst, Jonathan, and Yun S. Song. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences 112.25 (2015): 7677-7682.
http://www.pnas.org/content/112/25/7677.short
Abstract: The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic that is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, little is currently known about the information theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/log s), where s is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number s of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.

Baharian, Soheil, and Simon Gravel. “On the decidability of population size histories from finite allele frequency spectra.” Theoretical population biology (2018).
https://www.sciencedirect.com/science/article/pii/S004058091730148X
Abstract: Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild ‘reasonableness’ constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories, where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption."

So I think I should add these to the criticism I made earlier of this approach to @glipsnort here:

RichardBuggs:

However, I have to admit that although I think that your arguments from allele frequency spectra could potentially make a good test of the Adam and Eve bottleneck hypothesis, I would need to see this worked through in considerably more detail before I was fully persuaded that it was an adequate test. I have been reading a bit more widely about site frequency spectra and the factors that can affect them in a few spare hours. In particular I found these recent papers helpful:

Harpak, A., Bhaskar, A., & Pritchard, J. K. (2016). Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans. PLoS genetics, 12(12), e1006489.

Ferretti, L., Ledda, A., Wiehe, T., Achaz, G., & Ramos-Onsins, S. E. (2017). Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests. Genetics, 207(1), 229-240.

Koch, E., & Novembre, J. (2017). A Temporal Perspective on the Interplay of Demography and Selection on Deleterious Variation in Humans. G3: Genes, Genomes, Genetics, 7(3), 1027-1037.

Gao, F., & Keinan, A. (2016). Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models. Genetics, 202(1), 235-245.

These papers have strengthened my view that a wide range of complex demographic, phylogenetic, selective and mutational processes, together with sampling strategies, can influence site frequency spectra, and that I therefore cannot conclude from the models that you have run that a bottleneck of two in the history of the human lineage is not possible. To be convinced I would need to see more complex models run that try to incorporate these factors.

In addition, I came across this paper which @DennisVenema may find interesting as he writes his blog about the PSMC method

Kim, J., Mossel, E., Rácz, M. Z., & Ross, N. (2015). Can one hear the shape of a population history?. Theoretical population biology, 100, 26-38.
http://www.sciencedirect.com/science/article/pii/S0040580914000987?via%3Dihub

I have also been reading up more on ARGweaver and intend to post again on this soon @Swamidass .