Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1)

Swamidass · December 23, 2017, 4:21pm

That is certainly helpful. Thanks! I did make that mistake. I’d use a generation time of about 15 (at minimum) to 25. So that brings the TMR4C from that table up past 2 mya. Now I see why it helps your case.

However, they are MAX values of samples from a distribution with a very high variance. This is an extreme-value distribution. We need to see the whole distribution. Know any papers with that? Perhaps they can send us their values if we ask them? Perhaps send them an email.

Because of how these numbers are selected, we cannot draw a strong inference from them yet. It is not sound to cherry pick regions with low TMRCA’s, but this study is just cherry picking the ones with high TMRCAs, if it is used this way.

The claim was that Homo sapiens do not dip down to a single couple, and we know this with certainty approaching heliocentrism.

At the time, everyone thought Homo sapiens arose 200 kya, but now there is strong enough evidence that this is no longer the consensus. Some think Homo sapiens arose 300 kya or even as early as 350 kya. If that new finding unsettles Dennis’ claim, then that claim should never have been presented as heliocentrism level certainty. There is no similar sort of evidence we can imagine that would unsettle our view of heliocentrism. The fact that Dennis did not take into account uncertainty in determine the origin date of Homo sapien is part of what is at question here.

To be clear, he certainly is not responsible for excluding evidence published after his book was published. However, that evidence does call into question his heliocentrism certainty, if in fact population bottlenecks between 300 kya and 200 kya are plausible (which we have not yet determined). If that is the case, then part of his certainty rested on false confidence in when humans arise. Of course, if we cannot see plausibility for a bottleneck till say, before, 1 mya, that is not really relevant any ways.

Setting that issue aside, none of the studies I have seen correct for interbreeding. The scientific consensus is that our ancestors never dip to a single couple, not that Homo sapiens never dip to a single couple. It would be really interesting to see the studies that raises Dennis’ confidence so high on this one. He has read the literature more, so he might have seen something I missed.

Just as he corrected me on that TMRCA table, I’d love to have him correct me here too. However, this really does to seem to be a novel claim he is making. I am not even sure I can envision the study that could demonstrate this claim.

tallen_1 · December 23, 2017, 4:22pm

That’s what I’d thought ;). Thanks! Is it your sense Richard is disputing an exclusion of a bottleneck within 200KYA then? Or just your upper ranges with less than such a high level of confidence?

DennisVenema · December 23, 2017, 4:24pm

The paper is a whole-genome study, and these are the largest TMRCA values that they found. Why is this “cherry picking”? Shouldn’t we be interested in the range of TMRCA values in the genome if we’re interested in the range of estimated TMR4A values?

Put another way: finding more recent TMRCA values is not an issue for Richard. The issue is how far back the range of TMRCA values we see in the genome goes.

I was actually surprised to see some regions with TMRCA values higher than the MHC complex. I expected that to be near the top (and it is) but there are several regions with similar TMRCA values.

DennisVenema · December 23, 2017, 4:25pm

I have no idea. Richard has not clarified what timeframe he is interested in.

Swamidass · December 23, 2017, 4:30pm

We need to see the whole distribution. Just looking at the tail of the distribution does not tell you about the mean or the mode. Using these TMRCA values is like using an estimate well outside the 95% confidence interval (on the high side). We could have just as well used the minimum TMRCA values as valid estimates. Both approaches are not valid for similar (though not identical) reasons.

As you know, the sampling distribution for TMRCAs have very high variance. As I understand it, the signal to noise ratio actually increases as you go farther back. For that reason, looking at extremal values (maxs and mins) is always going to be flat out wrong. We need to see the full distribution, to see if it is unimodal, bimodal, and what the means/modes are. That’s just basic statistics, right?

The good news is that these authors actually have the data we need. I’m going to send them a note asking for data.

tallen_1 · December 23, 2017, 4:31pm

I think that’s my frustration as well. It’d be very helpful for him to do so. He’s been asked by yourself and others. Hopefully when he returns to this thread he will be more forthcoming with his answers to these questions, especially since you’ve held up your end of the bargain in meeting all of his.

In the meantime, what do you think of Swamidass‘ point that humans may have speciated at 300 or even 350KYA? Do you find this relevant? For me, given that distances us so far from behaviorally as well as anatomically modern humans, it doesn’t mean much to me. But I’m curious as to your thoughts and whether Swamidass‘ point means a revision of your claim is warranted.

DennisVenema · December 23, 2017, 4:36pm

I’m not quite following you here (maybe I need another cup of coffee). I agree if we’re interested in the TMRCA for the genome as a whole then the whole distribution is important. In this case, we’re interested in the TMRCA value of specific genome regions. Why would the oldest measures be intrinsically less accurate?

Also, we’re looking at TMRCA values in excess of 10 million years in some cases. Are we saying that this isn’t good evidence for a TMR4A > 400KYA?

Swamidass · December 23, 2017, 4:39pm

I’m not making any revisions.

There is immense debate about what “human” and Homo sapiens is. If we are trying to communicate the scientific consensus to the public, to make claims of heliocentrism level certainty, we need to be taking that lack of consensus into account.

The real problem, however, is not with the date of 300 kya vs 200 kya, but in making a claim about Homo sapiens, when population genetics seems only to be making claims about “our ancestors”, our total “lineage”, which includes non-Homo sapiens.

You can try this yourself with a gaussian distribution and python code. If you want, I can even write up a piece of code. Sample numbers from a distribution ten thousand times. Take the maximum of those samples. How close is that to the mean? Not very close. Same problem here.

We cannot really estimate the average height of people by just looking at the heights of people in the NBA. Its just not statistically sound. Same thing here.

DennisVenema · December 23, 2017, 4:39pm

One of the things I try to communicate in the book is that delineating “species” is an attempt to draw a line on a gradient. As we learn more and more about our ancestors, it’s going to get harder and harder to draw a line - a point I make in the book. I think we see with the remains at 300KYA exactly that issue - some say they are sapiens, others aren’t so sure. It’s exactly what we would expect.

DennisVenema · December 23, 2017, 4:41pm

We’re not interested in the genome average, though. We’re interested in the range.

tallen_1 · December 23, 2017, 4:50pm

Since the only real reason we’re examining the claim as to whether a bottleneck down to two humans could ever be plausible is driven by theological concerns, it would be helpful to examine what sort of human then would be relevant to those concerns. For me, a non-behaviorally modern human, or any other hominid for that matter, that can accomplish perhaps merely the construction of rudimentary stone tools does not map well onto the sort of Adam & Eve referenced in scripture. Curious as to your thoughts though.

Swamidass · December 23, 2017, 5:05pm

We are interested in the distribution. The distribution includes the range, the mean, mode, min, max, and much much more information.

DennisVenema · December 23, 2017, 5:17pm

Don’t forget that in several cases here we’re talking about polymorphisms shared between humans and chimpanzees. That places the TMRCA for those regions prior to the human-chimp divergence, which is over 3.5MYA (using a very conservative value). Thus TMR4A would be over 875,000 years ago.

DennisVenema · December 23, 2017, 5:20pm

I also agree that if we were talking about one or two genome regions I might be more skeptical - but we’re talking about several independent regions with very high TMRCA values. Not sure how looking at a distribution is going to change the conclusions of the authors of that paper.

tallen_1 · December 23, 2017, 5:30pm

Dennis, since the conversation is happening on this thread rather than the 2nd part to this series, what of your argument presented there that a bottleneck down to two would cause a discernible spike in the TMRCA at the time of such an event? Would that be pertinent to the analysis of these papers?

DennisVenema · December 23, 2017, 5:38pm

For that we need the distribution of the TMRCA values across the genome. This is actually what PSMC modelling does - it’s really a distribution of TMRCA values that is then used to infer Ne at the various times. So, it would show up as a dip in a PSMC plot. Richard thinks it would not be detected, but I disagree.

tallen_1 · December 23, 2017, 5:39pm

Got you, thanks!

DennisVenema · December 23, 2017, 5:41pm

The trick is PSMC papers only use one genome at a time. These two papers look at many individuals. So, we can find deeper TMRCA values than a typical PSMC study might find.

tallen_1 · December 23, 2017, 5:48pm

Understood. Is there anyway to visualize a distribution of whole genome TMRCA data outside a PSMC analysis then? Or is noticing such a dip (or its lack) our only option?

DennisVenema · December 23, 2017, 6:01pm

I think Josh has asked the authors of the paper for their dataset, so that would be another way. In published papers? Not that I’m aware of, though they might be out there. Of course the MSMC papers (Durbin group) are also a representation of the distribution, but that’s a modified PSMC on several genomes at once.