Question for Dennis about population genetics

nashenvi · July 30, 2017, 6:24pm

I’m still curious about this question, if people have any more thoughts. @glipsnort, is there an equation which relates the variant allele frequency to population size and number of generations? And perhaps one that includes not just relative variant frequency but absolute number of variants as well?

glipsnort · July 31, 2017, 1:42am

It is well modeled by diffusion. That’s how Kimura derived his results. Joe Felsenstein sketches the math in his online book, if you want to take a look. You generally end up with the same results by coalescent methods, which is the approach I’m more familiar with.

The mean time to the most recent common ancestor for an idealized diploid population is 4N, where N is the population size. The standard deviation is also 4N, so you do have to wait quite a while. For my simulation of a constant-sized population, I burned in for 20N generations as a reasonable approximation. [quote=“Swamidass, post:18, topic:36348”]
@glipsnort can you give us your graphs without scaling to total population? That might help.
[/quote]
For the simulations I used a mutation rate of 0.1 mutation/genome/generation, so there’s no direct comparison with genomic data; you have to scale one way or another.[quote=“Swamidass, post:18, topic:36348”]
The bigger you are, the bigger steps your random walk takes each generation, and that is what gives rise to the power law distribution in this case. You intuition should be built around that basic logic. It doesn’t have to be “large” fluctuation, but a “larger” fluctuation.
[/quote]
That’s not my intuition. Fluctuations are just from binomial sampling, whose variance is symmetric around (and peaks at) f = 0.5. Unfortunately, I don’t have a good intuition to replace it with. I do note that allele frequencies diffuse toward higher values until each frequency class contributes equally to ultimate fixation, but it’s not obvious to me why that has to be the stationary condition.

The number of segregating SNPs plateaus. Not for modern humans, of course, since we have an immense population and we won’t plateau until after the sun has consumed the earth, but for our ancestral population, the number of variant sites was more or less constant. I believe that’s what’s meant by steady state here.

Well, here’s where it was being derived originally (eq. 7 is the relevant one, I believe).[quote=“nashenvi, post:21, topic:36348, full:true”]
I’m still curious about this question, if people have any more thoughts. @glipsnort, is there an equation which relates the variant allele frequency to population size and number of generations? And perhaps one that includes not just relative variant frequency but absolute number of variants as well?
[/quote]
For a constant-sized population, the shape of the distribution is independent of the population size. There may be closed-form solutions for other simple demographic histories (well, for exponential expansion, anyway), but mostly one just simulates what the frequency spectrum would look like under some scenario. There are approaches for inferring demographic history from allele frequency spectra, e.g. dadi and psmc

Since the spectrum is independent of population size, as you have noted, these approaches can only determine changes to population size; overall size and thus overall dating has to be calibrated from some other input. This can be from recombination, as @Swamidass has pointed out, or from the total number of variants – or more realistically, from the diversity seen in pairwise comparisons, so that you don’t have to sample the entire population. The mean number of differences between pairs of genomes is just equal to 4Nµ, where µ is the mutation rate. This will give you an answer with some uncertainty – maybe as much as a factor of two – but easily good enough to rule out recent tiny population sizes. My opinion is that diversity gives a better constraint than recombination, since the latter varies a lot across the genome and between genetically different individuals.

nashenvi · July 31, 2017, 1:58am

I think @Swamidass is actually right on this. Think about the diffusion analogy. In a random walk, the speed of diffusion is determined by the variance of the step from a given position, since the average step in an unbiased walk is always 0. So let’s consider two cases. First, imagine that a particular population contains exactly 1 copies of a particular allele. In the next generation, the frequency of the copies will be either 0,1, or 2 with probabilities [1,2,1] / 4. Second, imagine that a particular population contains exactly 2 copies of a particular allele. In the next generation, the frequency of the copies will be either 0,1,2,3, or 4 with probabilities [1,4,6,4,1]/16. Now in both cases, the expected result is no change in allele ferquences. But the variance of the second case is larger than in the first case. And since the variance is what determines the speed of diffusion, the larger the frequencies (at least for f < .5), the faster the diffusion.

Essentially, I think this is reminiscent of diffusion in a liquid with linearly decreasing viscosity.

"> overall size and thus overall dating has to be calibrated from some other input. "

Ok, so perhaps I’m confused, but aren’t you then saying that the frequency spectrum alone can’t determine whether we came from a single ancestral human pair? And since you agreed that we can get roughly the right number of SNPs (with > 1% frequency), isn’t this data also insufficient to rule out a single ancestral pair? So how exactly does the SNP data show that we couldn’t have come from a single ancestral pair, since that’s the claim Venema makes in his book?

glipsnort · July 31, 2017, 2:21am

nashenvi:

I think @Swamidass is actually right on this. Think about the diffusion analogy. In a random walk, the speed of diffusion is determined by the variance of the step from a given position, since the average step in an unbiased walk is always 0. So let’s consider two cases. First, imagine that a particular population contains exactly 1 copies of a particular allele. In the next generation, the frequency of the copies will be either 0,1, or 2 with probabilities [1,2,1] / 4. Second, imagine that a particular population contains exactly 2 copies of a particular allele. In the next generation, the frequency of the copies will be either 0,1,2,3, or 4 with probabilities [1,4,6,4,1]/16. Now in both cases, the expected result is no change in allele ferquences. But the variance of the second case is larger than in the first case. And since the variance is what determines the speed of diffusion, the larger the frequencies (at least for f < .5), the faster the diffusion.

Sure. But now consider the same case, when the allele has increased in frequency to the point that there are only two copies of the other allele present, and then 1 copy. The variance decreases just as it initially increased – binomial variance is symmetric around f = 0.5.

Correct.

Well, it was well outside the uncertainty in the mutation rate, but sure.

By using both kinds of information at the same time. In order to get a reasonable genetic diversity, you need to have a large population for a fairly long time (note that your scenario was well outside what a YEC would find acceptable). In order to get the right frequency spectrum, you need to have had a constant-sized population for a long time, as measured in units of population size. But you’ve already fixed the population size to be large by the diversity. So you’re stuck with a long time of constant size in years.

nashenvi · July 31, 2017, 2:35am

Ah, that’s a good point. Because of the assumptions I made, I only considered f < .5.

“note that your scenario was well outside what a YEC would find acceptable”

But I’m not a young-earth creationist! As I said to @Swamidass, it seems to me that the idea of a single ancestral pair is entirely compatible with naturalistic, Darwinian evolution. So there needn’t be any religious motivation behind the question of whether the genetic evidence rules out an ancestral pair.

"By using both kinds of information at the same time. "

Ok, so let me see if I can restate the argument. There are two relevant pieces of data from modern genetics surrounding SNPs and an original ancestral pair: 1) an observed steady-state allele frequency distribution and 2) > 10M SNPs. Populations need around 20*N generations to reach a steady-state variant frequency distribution, which we see today. So the creationist has a dilemma. If makes N really small, then there will be enough time to achieve steady-state, but he will not have nearly enough SNPs. On the other hand, if he makes the population big enough to get 10M SNPs, then it will take far too long to reach steady state.

Is that the basic argument?

glipsnort · July 31, 2017, 3:29am

But the main targets of these arguments are.

Yup.

Another way of thinking about the same data may be more intuitive. If you compare any two copies of the human genome (say the two copies you carry), you will find about 3 million single-base differences. These are all mutations that have accumulated since the two copies were produced as offspring from a single parent genome(*). At 50 mutations per generation in each copy, that represents 30,000 generations of mutation since they last shared an ancestor, or about 750,000 years of history preserved in your DNA. And that’s the average: some parts are younger than that and some parts are younger.

(*) Actually different parent genomes for different bits of the genome – what I’m talking about is averaged across the genome.

Swamidass · July 31, 2017, 8:41am

A single ancestral pair within a larger population of interbreeding individuals is possible. This is very important emphasize. WITHIN A LARGER POPULATION.

I do not affirm “Darwinian” evolution, which is often understood as a synonym for atheism. I am a Christian. I affirm God created us through evolution.

And yes, interest in scientifically assessing a model does not imply religious motivation. I entirely agree.

Swamidass · July 31, 2017, 8:56am

Yes, I am familiar with this and know that the term “diffusion” is used, because it is “diffusion-like”. This may be confusing our friend, however, because he is trying to build an intuition on the answer here, which is different than the actual diffusion.

The source of the difference is…

This is exactly my point. As you know, 0.5 frequency is the max, at which point the minor/major allele flips. That means 0 < frequency < 0.5. In that range, variance increases as frequency increases. That is exactly my point, because this is mathematically how Kimura’s diffusion is different than, say, molecular diffusion which does not have any changes in variance. Moreover, this is a well known mechanism for how power laws are generated.

Now, I would be curious if this can be modeled by…

Does that really work as an alternate derivation? My intuition says “no”, because this takes a discrete problem and makes it continuous, without a correction. I do not think this gives the same result.

Well then this may be the equilibrium distribution. I do think that you will see this arise before equilibrium is hit.

nashenvi · July 31, 2017, 12:12pm

This is exactly the kind of argument I was looking for! But wouldn’t this be compatible with a single ancestral pair in far more recent history? For example, let’s say that around 200,000 years ago, one male-female pair became geographically isolated and their descendants eventually evolved into the human race. Any two of these two individuals’ genomes would differ by around 1.8M bp, for the reasons you stated. The remaining 200,000 years to the present would fill in the remaining mutations. Wouldn’t this scenario yield exactly the same number of mutations as is currently observed?

Now my guess is that you might be able to detect this scenario via some other means (like the rate of crossover?) But in terms of the SNP data it seems indistinguishable from the N=10,000 population scenario. Is that correct?

gbrooks9 · July 31, 2017, 1:19pm

@nashenvi

It is a function of:
1.) Size of founding population (2, 2000, 20,000, etc) being analyzed

2.) Average rate of allele configuration change

Diversity score for successor population in a given year vs. Years elapsed since.

nashenvi · July 31, 2017, 1:40pm

Why do you think this? For example, why is it not possible that modern Galapagos tortoises or modern Tasmanian devils or modern humans all evolved from a single ancestral pair which became geographically isolated from the larger interbreeding population until speciation occurred?

“My intuition says “no”, because this takes a discrete problem and makes it continuous, without a correction.”

That’s not an issue. We routinely go from discrete problems to continuous problems in diffusion. The correction is usually O(1/N) where N is the number of discrete steps. So if N is large, it can be ignored.

gbrooks9 · July 31, 2017, 2:31pm

@nashenvi

I’m sure @Swamidass will be answering you soon. But in the meantime,
let me explain that he is trying to coordinate the elements of the conventional YEC view of human history with the elements of human history that evolutionary science presents.

So:

Adam & Eve were a special couple, selected out of a larger hominid population.
The antiquity and diversity of this larger hominid population is the source of humanity’s current genetic diversity, since a founding pair of two - - 6000 years ago - - would not be able to produce such diversity.
Once Adam & Eve were released back into the general population, their descendants dispersed widely enough that they (along with the descendants of a few hundred [or a few thousand?] other progenitors), are part of humanity’s Common Universal Ancestors!

Swamidass · July 31, 2017, 3:26pm

There is not yet a mathematical model for a single ancestral pair (not within a larger population ) that we know will make sense with the full range of genetic observations in humans (SNPs, LD, interspecies variation, etc., etc. etc.), without requiring ongoing miracles. Maybe one exists, but this has not be demonstrated. I eagerly await the results of some people working on this, but the data is just not there yet, and may not be possible.

Okay that makes sense.

Can you show the math or link to it?

T_aquaticus · July 31, 2017, 5:44pm

Darwinian evolution is not a synonym for atheism any more than Newtonian gravity is a synonym for atheism. Darwinian evolution is predominately viewed as random mutations with respect to fitness (as determined by statistical tests) followed by natural selection as well as diversification through speciation. In the end, attaching labels to the theory of evolution (e.g. Modern Synthesis, Neo-Darwinian) usually leads to more misunderstanding than it’s worth. Attaching names to theories is a bad habit carried over from Victorian times, but I don’t see how it has any atheistic implications.

DennisVenema · July 31, 2017, 6:03pm

I’m late to the party here - taking some time off this summer, and not spending much time online - but I’ll offer my thanks to @glipsnort and @Swamidass for ably handling these questions - better than I could do myself. Thanks gents.

Swamidass · August 1, 2017, 5:44am

[Edited]

Often I have heard “Darwinian” equivacated for “Darwinism”, which is why I avoid the term unqualified like that. I would be more okay with modern evolutionary sceince or Darwinian and Non-Darwinian evolution.

This is just the Darwinian mechanism, but this is not enough to explain the what we see in biology.

Which makes it really interesting that this theory of evolution was definitively falsified as the dominant mechanism of change in genetic data in the 1960s. In secular graduate science programs, for several decades, we teach Darwinian evolution, in the sense you have defined it, as a failed theory of evolution.

Of course, evolution is even more clearly established by both Darwinian and non-Darwinian mechanisms together.

gbrooks9 · August 1, 2017, 12:36pm

@Swamidass

What would be a list of the leading non- Darwinian mechanisms?

Swamidass · August 1, 2017, 3:55pm

Well sorry about that. I was just trying to be lighthearted about this this detail that we have covered for about the 100th time.

I can see the confusion in how it came out.[quote=“benkirk, post:38, topic:36348”]

Good news. Evolution is even more clearly established by non-Darwinian mechanisms.

Reality: evolution proceeds by both non-Darwinian and Darwinian mechanisms, just as physics has Newtonian and non-Newtonian facets.
[/quote]
And of course I agree.

This is a common rhetorical move that is make by, for example, Dawkins. This is why I resist using the term Darwinian evolution. This type of evolution is not enough to explain what we know about biology, we need non-Darwinian mechanisms too.

Regardless, I edited the post. Sorry about how I put it.

T_aquaticus · August 1, 2017, 5:08pm

In my experience, “Darwinism” is a term used mostly by anti-evolutionists to create the false appearance that evolution is a religious belief.[quote=“Swamidass, post:36, topic:36348”]
Which makes it really interesting that this theory of evolution was definitively falsified as the dominant mechanism of change in genetic data in the 1960s. In secular graduate science programs, for several decades, we teach Darwinian evolution, in the sense you have defined it, as a failed theory of evolution.
[/quote]

I have always viewed this as a false dichotomy. Neutral mutations still pass through the filter of natural selection. Even Darwin spoke of neutral changes to organs, resulting in rudimentary or “vestigial” organs. Neutral drift is still well within Darwinian mechanisms.[quote=“Swamidass, post:36, topic:36348”]
Of course, evolution is even more clearly established by both Darwinian and non-Darwinian mechanisms together.
[/quote]

What do you consider “non-Darwinian mechanisms”?

Swamidass · August 1, 2017, 7:09pm

Yes it is. But it is also used by atheists trying to make evolution seem incompatible with religious belief. It is a very unhelpful term. It serves no one but those wanting conflict.

Darwin did talk about neutral changes. However Darwinian evolution is not commonly thought to include non-Darwinian mechanisms. Non-Darwinian mechanisms have included anything other than positive selection. A good review is here:

https://www.nature.com/news/does-evolutionary-theory-need-a-rethink-1.16080

To be clear, I am solidly in the anti-“Extended Evolutionary Synthesis” camp. As the opponents of EES say…

We invite [EES] to join us in a more expansive extension, rather than imagining divisions that do not exist. We appreciate their ideas as an important part of what evolutionary theory might become in the future. We, too, want an extended evolutionary synthesis, but for us, these words are lowercase because this is how our field has always advanced.

Notice that they do not identify as Darwinians or emphasize positive selection in their response. Instead they acknowledge all the non-Darwininan mechanisms raised by the EES as already part of “evolutionary theory.”

“Darwininan” evolution is an unhelpful term that I cannot endorse as a meaningful description of the modern understanding of evolution. As brilliant as Darwin was, he only had subtle hints of what we know now. Quantitatively speaking, for example, positive selection accounts for less than 0.1% of the mutations in DNA, the rest are explained by neutral processes, the first widely established Non-Darwinian process.