Continuing the discussion from Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1):
Continuing the discussion from Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 1):
I don’t have time for a full response to your points right now, but just a brief note to say that the articles I wrote 10 years ago on human-chimp similarity are still online here:
In the latest human chimp reciprocal best alignment on the UCSC browser, hg38vsPanTro6, I make it that 85.5% of the human genome has one-to-one orthology to the chimpanzee genome, with 1.3% difference within that alignment due to SNVs. So I freely admit to say that my prediction 10 years ago was not born out: I placed too much reliance on the data that was available at the time, overestimating how complete it was.
However, my overall point (that it is wrong to say that our genomes are 99% identical to chimpanzees) still stands.
If you can find anything in those articles that you or @glipsnort or anyone else can show warrants a retraction, I would gladly do so.
I respect that retraction. I did not know you had done so. That is the right response.
I have not looked at those closely in a while, but do vaguely remember some important errors there. I will look at those again sometime soon, and let you know if that is the case (or if my memory fails me).
There was nothing in the data available 10 years ago that would lead to your stated conclusions.
‘When we do this alignment, we discover that only 2400 million of the human genome’'s 3164.7 million ‘‘letters’’ align with the chimpanzee genome - that is, 76% of the human genome. Some scientists have argued that the 24% of the human genome that does not line up with the chimpanzee genome is useless “junk DNA”'
As I noted above, there was never 24% of the human genome that does not line up with the chimpanzee genome; most of that 24% represents DNA we never assembled in the chimp genome, so there was no comparison to be made. Who were the scientists who argued that this was junk DNA?
No one made that claim regarding the non-aligned portions.
Humans and Chimps vs Mice and Rats…
I’ve been covering this in detail on my forum with @agauger.
Hi @glipsnort and @Swamidass, thank you for your critiques of the articles I wrote back in 2008 about human and chimpanzee genome comparisons as they relate to the 2005 chimpanzee genome paper. I have to admit, it had been years since I read what I wrote back in 2008, until Dennis mentioned them on this forum. The issue itself has been on my mind on and off over the past 10 years though, and especially the most major novel point that the articles made: that unaligned regions should be acknowledged when giving single-percentage estimated of human-chimp similarity to the layperson.
I would have liked to have followed up my articles with a study published in the peer reviewed literature, but for various reasons have not had the resources to do so. I wrote a grant proposal for the Templeton Foundation in 2011 which would have involved a rigorous test of what I wrote in 2008, with targeted re-sequencing of the unaligned regions, but it was unfortunately not funded.
One thing I would emphasis is that in 2008, although I received quite a lot of skeptical emails about my articles, no one pointed out scientific errors. No one emailed me and said “Hi Richard, I know a little bit more about this than you, and you are wrong about the unaligned sequences because of this and this and this” providing scientific critiques that held water. If they had done that I would have published a correction or retraction. But no one did, and indeed, no one has in the last 10 years. Meanwhile, I discussed it with a lot of scientists, and those who I explained it to thought I had a valid point about the unaligned sequences.
That is not to say that today I think that my articles were perfect. I now know a lot more about genomics that I did then, and have also matured as a person. I will therefore be very happy to write a blog about my current perspective on my 2008 articles. To determine whether this needs to take the form of an update, or a revision, or a retraction, I would appreciate it if you could help me to answer three questions: (1) what does the data say today in 2018, and how can it be described to the public in an adequate manner? (2) what did the data say in 2008 when correctly interpreted? (3) Was I unreasonable in my interpretation of the 2005 Chimp paper in 2008?
As a preliminary, below, I have copied my articles in full. You will notice that the quote that has already been shared is from the first article I wrote, and that two months later, I published a second article to clarify misunderstandings that had arisen from the first article. I am not suggesting for a moment that my second article fully deals with the concerns that you have just raised, or is fault-free, but I want to make sure that you know exactly what I wrote, and its context. As I read these 10 years on, I am aware that if I wrote them today I would write with a more cautious and measured tone, but that is not the main issue that we have before us – the main issue is factual correctness. I would appreciate it if you could take the trouble to read the articles in their entirety, so you are fully aware of their content and context, and then I will come to my three questions.
Published 11 October 2008
From 1964 to 2004, it was believed that humans are almost identical to apes at the genetic level. Ten years ago, we thought that the information coded in our DNA is 98.5% the same as that coded in chimpanzee DNA. This led some scientists to claim that humans are simply another species of chimpanzee. They argued that humans did not have a special place in the world, and that chimpanzees should have the same ‘rights’ as humans.
Other scientists took a different view. They said that it is obvious that we are very different from chimpanzees in our appearance and way of life: if we are almost the same as chimpanzees in our DNA sequence, this simply means that DNA sequence is the wrong place to look in trying to understand what makes humans different. By this view, the 98.5% figure does not undermine the special place of humans. Instead it undermines the importance of genetics in thinking about what it means to be a human.
Fortunately (for both the status of human beings and the status of genetics) we now know that the 98.5% figure is very misleading. In 2005 scientists published a draft reading of the complete DNA sequence (genome) of a chimpanzee. When this is compared with the genome of a human, we find major differences.
To compare the two genomes, the first thing we must do is to line up the parts of each genome that are similar. When we do this alignment, we discover that only 2400 million of the human genome’'s 3164.7 million ‘‘letters’’ align with the chimpanzee genome - that is, 76% of the human genome. Some scientists have argued that the 24% of the human genome that does not line up with the chimpanzee genome is useless “junk DNA”. However, it now seems that this DNA could contain over 600 protein-coding genes, and also code for functional RNA molecules.
Looking closely at the chimpanzee-like 76% of the human genome, we find that to make an exact alignment, we often have to introduce artificial gaps in either the human or the chimp genome. These gaps give another 3% difference. So now we have a 73% similarity between the two genomes.
In the neatly aligned sequences we now find another form of difference, where a single ‘‘letter’’ is different between the human and chimp genomes. These provide another 1.23% difference between the two genomes. Thus, the percentage difference is now at around 72%.
We also find places where two pieces of human genome align with only one piece of chimp genome, or two pieces of chimp genome align with one piece of human genome. This “copy number variation” causes another 2.7% difference between the two species. Therefore the total similarity of the genomes could be below 70%.
This figure does not take include differences in the organization of the two genomes. At present we cannot fully assess the difference in structure of the two genomes, because the human genome was used as a template (or “scaffold”) when the chimpanzee draft genome was assembled.
Our new knowledge of the human and chimpanzee genomes contradicts the idea that humans are 98% chimpanzee, and undermines the implications that have been drawn from this figure. It suggests that there is a huge amount exciting research still to be done in human genetics.
Second article, Published 6 December 2008
I recently wrote here on the genetic difference between humans and chimpanzees. This proved to be more controversial than I expected. Several people have emailed me to ask questions, or tell me I am wrong. Today I will revisit this topic to clarify a few things.
My major point was that we now know that the genetic difference between humans and chimps is much bigger than we once thought. Our genomes are not 99% the same as chimpanzee genomes. Just a few months ago, a top science journal said this in a news report entitled “Relative differences: The myth of 1%” (Science 316:1836).
This report highlighted a recent study showing that that 6.4% of all genes in the human genome do not have closely similar counterparts in the chimpanzee genome (Demuth et al, PLoS ONE 1: e85). It also cited the chimpanzee draft genome paper that I mentioned in my previous article (Nature 437:69-87), and stated how the authors of this study had aligned 2.4 billion bases of the human genome with the chimpanzee genome, and found a 1.23% difference in single nucleotide polymorphisms (SNPs) and a 3% difference in insertion/deletions (indels).
Given these statistics, it is factually incorrect to say that humans are 99% the same as chimpanzees. Yet, just last month, the Natural History Museum in London and the University of Chicago Press in the USA published a book entitled “99% Ape: How evolution adds up”. This misleading title was doubtless chosen by a marketing guru rather than the editor, who is a reputable and distinguished scientist in plant evolutionary ecology (the field in which I did my doctoral research). Such promotion of the “myth of 1%” to the public as evidence for evolution is probably why some non-scientists have suggested on the internet that my earlier article, dispelling this myth, is somehow a death-blow to evolution - it is not. My article on chimpanzees went one simple step further than the Science report. I took the amount of the chimp genome which has been aligned with the human genome (2.4 billion bases) and divided this by the size of the human genome (3.16 billion bases), to work out that only 76% of the human genome shows the 1.23% SNP and 3% indel differences (see above). Using these figures, and citing 2.7% copy number variation between the two species (Nature 437:88-93), I argued for a total similarity of around 70%.
This is a conservative estimate of what we can be quite sure is similar. Like all estimates, it makes assumptions. The key one here is that the parts of the chimpanzee genome that did not align to the human genome are different to the human genome. In general this is obviously true - only similar sequences can be lined up - but it is possible that the complex procedure by which the scientists aligned the two genomes may have caused some similar sequences not to be included. In addition, the 4% of the chimpanzee genome that has not yet been sequenced, or portions that have not been sequenced accurately, may also prove to have some similarity to the human genome. These could raise the overall similarity by a few percent, but I predict that when we have a reliable, complete chimpanzee genome, the overall similarity of the human genome will prove to be close to 70% (and very far from 99%).
OK, so that’s what I wrote in 2008. I will come back to what I now think needs to be corrected or updated, but to facilitate this, I think it is helpful to first ask:
(1) what does the data say today in 2018, and how can it be described to the public in an adequate manner?
When the man on the street asks me “How much of the human genome is the same as the chimpanzee genome?” I understand him to be asking: “how much of the entire human genome is exactly the same as the chimpanzee genome?” Or, in evolutionary terms, “how much of the human genome has passed unchanged to both chimps and humans”. He is therefore asking “what is the total one-to-one orthology between the human and chimpanzee genomes”. To answer his question with certainty, I would look to the latest reciprocal best whole genome alignment between humans and chimps. This is available at the UCSC website. If my stats of this alignment are correct (I am happy to share my Perl script if anyone wants to check it), it is 2,761,498,322 long, including 17,622,179 bases of indels and 35,820,144 variant bases. The human genome sequence used in the alignment was 3,209,286,105 long. Thus the total percentage of the human genome that I can know for sure has one-to-one orthology with the chimp genome is 84.4%. Therefore I would say to the man on the street: we know for sure that the human genome is 84.4% the same as the chimpanzee genome [note added 23/4/18 to avoid disambiguate: i.e. this is our minimum lower bound]. I would explain to him how I derived the figure. Regarding the regions that are not one-to-one orthologs, I would explain that about a third of it (a little under 5% of the whole genome) appears to be duplicated, with slight differences, in the human but not the chimp, and some of it is centromeric, and some of it is regions that have not yet been completely sequenced in the human genome, and some of it is genic, and some I have no idea what it is.
OK, maybe I should stop there for now, as this is becoming rather a long post. I would welcome your feedback on my answer to question one. At the weekend, I will try to come on to my other two questions: (2) what did the data say in 2008, when correctly interpreted? (3) Was I unreasonable in my interpretation of the 2005 Chimp paper in 2008?
Just another side note on what the data says in 2018 that was unknown in 2008. As @DennisVenema has mentioned above in the context of incomplete lineage sorting, there are regions of the human genome that are more similar to the gorilla genome than the chimpanzee genome. The gorilla genome paper of 2011states that this is about 30% of the genome.
I have not looked in detail at a human-chimp-gorilla alignment, but assuming the 30% figure, then it would actually be entirely defensible to say that the human genome is approximately 70% chimpanzee (likely less if we take into account regions in the human genome without alignment to either gorilla or chimp). However, I will leave this point aside for now, because it was not the argument I was making in 2008.
That would presume that the 30% that groups first with gorilla has no match to chimpanzee - which is absolutely not the case. We’re talking about dispersed SNPs in three species where there is massive identity between all three.
OK, now to come to the second and third questions: (2) what did the data say in 2008, when correctly interpreted? (3) Was I unreasonable in my interpretation of the 2005 Chimp paper in 2008?
In 2008, here is what I said I did (from my second article that clarified my first article):
I took the amount of the chimp genome which has been aligned with the human genome (2.4 billion bases) and divided this by the size of the human genome (3.16 billion bases), to work out that only 76% of the human genome shows the 1.23% SNP and 3% indel differences (see above). Using these figures, and citing 2.7% copy number variation between the two species (Nature 437:88-93), I argued for a total similarity of around 70%.
The deduction of the 2.7% copy number variation was a mistake, as the alignment that the other figures were derived from was a reciprocal best alignment, so did not include those copy number variants in the first place. I have only realised this since I reviewed my old articles last week. I freely admit an error there, and I retract that.
Now, regarding my inclusion of regions of the human genome that were not aligned to the human genome in the alignment presented in the 2005 genome paper. I think I was correct to flag this up as an issue that needs to be addressed if we want a whole-genome figure for similarity between human and chimpanzee. It is basically simply stating that if we give a percentage we need to also give a sample size - that is basic good practise in reporting statistics.
I was also correct to say that “This is a conservative estimate of what we can be quite sure is similar.” It absolutely was. However, when I went on to state factors that could change this figure upwards, although I correctly identified some of the factors that could do this, I did not mention: (1) limitations in the assembly of areas of the chimpanzee genome for which reads had been obtained that had not been assembled, or had not been assembled corrrectly, (2) areas of the human genome that had not been assembled in to the human reference assembly at the time. I should have mentioned these as well. The first of these I was not aware of at the time, as I have never assembled a genome myself at that point. The second of these I could have been aware of if I had thought about it and read around a bit more. So I confess some sins of omission on those points.
Because I under-estimated the factors that could increase the length of the alignment, I ended up making a prediction that the length of the alignment would stay fairly stable into the future, and wrote: “I predict that when we have a reliable, complete chimpanzee genome, the overall similarity of the human genome will prove to be close to 70% (and very far from 99%).” This prediction has not been upheld by the data, as I have already said above. However, although a failed prediction is embarrassing, it is not the something that one can really retract, one just has to admit it was a wrong prediction.
Hi Dennis, it depends if we are talking about identify by state (IBS) or identity by descent (IBD). If the latter, then my point stands. If the former, then your does.
I will come to @glipsnort’s points later - some of them have not been copied across from their previous location yet…
If you tell the average layperson that humans and chimpanzees have 70% identity, there is not a layperson on the planet that would assume IBD. Without qualification they would assume IBS, and you would likely have to spend some time explaining the difference. It’s pretty clear from your original articles that you’re discussing IBS.
I have been quite clear that I was not making this point in my original articles, and I was not offering this as a defence of my “70% chimp”? headline
A layperson will not know the term IBD, but he may assume that “the same as X” means “inherited unchanged from X”.
No, it’s ~15% of the genome in which human is more similar to gorilla than to chimp; in another 15%, chimp and gorilla are the most similar and humans equidistant from both. This was known well before 2008. The presence of incomplete lineage sorting in the human/chimp/gorilla lineage was shown in this paper in 2001, and this paper provided an estimate of the amount in 2006; their estimate was 18-29%, somewhat lower than the 30% found in the gorilla genome paper.
I’ll reply further as I have time.
For most of this conversation (and the one that birthed it), I don’t have anything to contribute, because you are both at a much higher level than I am able to add anything of value to. I appreciate being a fly on the wall.
But if you’re looking for a layperson’s naive assumptions, well, that I can weigh in on!
True to your predictions, this layperson did not know about IBD or IBS before reading this exchange. But I assumed that “X% of the human genome is the same as the chimp genome” to mean that the two had A’s, C’s, T’s and G’s at the same positions. My hastily-read-Wikipedia-based understanding of the IBS/IBD distinction suggests that I naively assumed IBS. I would never have thought it meant IBD. And I certainly would not have assumed it meant “inherited unchanged from the chimp genome,” because even laypeople (well, at least laypeople in the EC camp) know that we’re descended from a common ancestor, not chimps.
That said, I did not read your paper back then, and haven’t read it now, so I don’t know if, in context, I might have read it as you intended.
I hope that this has been at least a marginally helpful contribution to the discussion. If not, carry on…
Thanks @glipsnort for the correction about human/gorilla/chimp similarity. I made my short post airing that thought a bit hastily, which I regret as it seems to have become a bit of a distraction from the main issue.
Hi @AMWolfe, thanks for pitching in! Unfortunately, I fear many laypeople are less clued up that you. What about the statement I made previously:
How do you respond to that? Do you think this is what a layperson might mean?
If I might inject another man on the street comment here. Thanks to CSI and paternity tests everyone, or almost everyone, has the idea that each person’s DNA is unique, but there must be parts that are the same. When the question is how much of the human genome is the same as the chimpanzee genome that would just mean the bits of the genome that are important would be included in the comparison.
Note: I first learned about the research that went into that paper when David Reich sat down next to me at a conference and said, “Humans had sex with chimpanzees!” He seemed quite pleased (although he didn’t mean it exactly literally).
Right. I naively have always assumed that if you take any random strand of DNA, you’ll be able to find minor changes that have come along that differentiate, say, me from @Bill_II’s DNA, and a greater number of those changes that differentiate the two of us from some guy shaking a spear at an approaching boat in the Andaman Islands, and a greater number still of those changes that differentiate all of us from a chimpanzee, and so on to gorillas, orangutans, gibbons, and cuttlefish (for example), in their turn. I would have ranked these percentages accordingly from highest similarity to lowest.
So … I think this responds to the verbiage “exactly the same,” but I don’t know if that’s what was meant by that.