Hi @glipsnort and @Swamidass, thank you for your critiques of the articles I wrote back in 2008 about human and chimpanzee genome comparisons as they relate to the 2005 chimpanzee genome paper. I have to admit, it had been years since I read what I wrote back in 2008, until Dennis mentioned them on this forum. The issue itself has been on my mind on and off over the past 10 years though, and especially the most major novel point that the articles made: that unaligned regions should be acknowledged when giving single-percentage estimated of human-chimp similarity to the layperson.
I would have liked to have followed up my articles with a study published in the peer reviewed literature, but for various reasons have not had the resources to do so. I wrote a grant proposal for the Templeton Foundation in 2011 which would have involved a rigorous test of what I wrote in 2008, with targeted re-sequencing of the unaligned regions, but it was unfortunately not funded.
One thing I would emphasis is that in 2008, although I received quite a lot of skeptical emails about my articles, no one pointed out scientific errors. No one emailed me and said “Hi Richard, I know a little bit more about this than you, and you are wrong about the unaligned sequences because of this and this and this” providing scientific critiques that held water. If they had done that I would have published a correction or retraction. But no one did, and indeed, no one has in the last 10 years. Meanwhile, I discussed it with a lot of scientists, and those who I explained it to thought I had a valid point about the unaligned sequences.
That is not to say that today I think that my articles were perfect. I now know a lot more about genomics that I did then, and have also matured as a person. I will therefore be very happy to write a blog about my current perspective on my 2008 articles. To determine whether this needs to take the form of an update, or a revision, or a retraction, I would appreciate it if you could help me to answer three questions: (1) what does the data say today in 2018, and how can it be described to the public in an adequate manner? (2) what did the data say in 2008 when correctly interpreted? (3) Was I unreasonable in my interpretation of the 2005 Chimp paper in 2008?
As a preliminary, below, I have copied my articles in full. You will notice that the quote that has already been shared is from the first article I wrote, and that two months later, I published a second article to clarify misunderstandings that had arisen from the first article. I am not suggesting for a moment that my second article fully deals with the concerns that you have just raised, or is fault-free, but I want to make sure that you know exactly what I wrote, and its context. As I read these 10 years on, I am aware that if I wrote them today I would write with a more cautious and measured tone, but that is not the main issue that we have before us – the main issue is factual correctness. I would appreciate it if you could take the trouble to read the articles in their entirety, so you are fully aware of their content and context, and then I will come to my three questions.
Published 11 October 2008
From 1964 to 2004, it was believed that humans are almost identical to apes at the genetic level. Ten years ago, we thought that the information coded in our DNA is 98.5% the same as that coded in chimpanzee DNA. This led some scientists to claim that humans are simply another species of chimpanzee. They argued that humans did not have a special place in the world, and that chimpanzees should have the same ‘rights’ as humans.
Other scientists took a different view. They said that it is obvious that we are very different from chimpanzees in our appearance and way of life: if we are almost the same as chimpanzees in our DNA sequence, this simply means that DNA sequence is the wrong place to look in trying to understand what makes humans different. By this view, the 98.5% figure does not undermine the special place of humans. Instead it undermines the importance of genetics in thinking about what it means to be a human.
Fortunately (for both the status of human beings and the status of genetics) we now know that the 98.5% figure is very misleading. In 2005 scientists published a draft reading of the complete DNA sequence (genome) of a chimpanzee. When this is compared with the genome of a human, we find major differences.
To compare the two genomes, the first thing we must do is to line up the parts of each genome that are similar. When we do this alignment, we discover that only 2400 million of the human genome’'s 3164.7 million ‘‘letters’’ align with the chimpanzee genome - that is, 76% of the human genome. Some scientists have argued that the 24% of the human genome that does not line up with the chimpanzee genome is useless “junk DNA”. However, it now seems that this DNA could contain over 600 protein-coding genes, and also code for functional RNA molecules.
Looking closely at the chimpanzee-like 76% of the human genome, we find that to make an exact alignment, we often have to introduce artificial gaps in either the human or the chimp genome. These gaps give another 3% difference. So now we have a 73% similarity between the two genomes.
In the neatly aligned sequences we now find another form of difference, where a single ‘‘letter’’ is different between the human and chimp genomes. These provide another 1.23% difference between the two genomes. Thus, the percentage difference is now at around 72%.
We also find places where two pieces of human genome align with only one piece of chimp genome, or two pieces of chimp genome align with one piece of human genome. This “copy number variation” causes another 2.7% difference between the two species. Therefore the total similarity of the genomes could be below 70%.
This figure does not take include differences in the organization of the two genomes. At present we cannot fully assess the difference in structure of the two genomes, because the human genome was used as a template (or “scaffold”) when the chimpanzee draft genome was assembled.
Our new knowledge of the human and chimpanzee genomes contradicts the idea that humans are 98% chimpanzee, and undermines the implications that have been drawn from this figure. It suggests that there is a huge amount exciting research still to be done in human genetics.
Second article, Published 6 December 2008
I recently wrote here on the genetic difference between humans and chimpanzees. This proved to be more controversial than I expected. Several people have emailed me to ask questions, or tell me I am wrong. Today I will revisit this topic to clarify a few things.
My major point was that we now know that the genetic difference between humans and chimps is much bigger than we once thought. Our genomes are not 99% the same as chimpanzee genomes. Just a few months ago, a top science journal said this in a news report entitled “Relative differences: The myth of 1%” (Science 316:1836).
This report highlighted a recent study showing that that 6.4% of all genes in the human genome do not have closely similar counterparts in the chimpanzee genome (Demuth et al, PLoS ONE 1: e85). It also cited the chimpanzee draft genome paper that I mentioned in my previous article (Nature 437:69-87), and stated how the authors of this study had aligned 2.4 billion bases of the human genome with the chimpanzee genome, and found a 1.23% difference in single nucleotide polymorphisms (SNPs) and a 3% difference in insertion/deletions (indels).
Given these statistics, it is factually incorrect to say that humans are 99% the same as chimpanzees. Yet, just last month, the Natural History Museum in London and the University of Chicago Press in the USA published a book entitled “99% Ape: How evolution adds up”. This misleading title was doubtless chosen by a marketing guru rather than the editor, who is a reputable and distinguished scientist in plant evolutionary ecology (the field in which I did my doctoral research). Such promotion of the “myth of 1%” to the public as evidence for evolution is probably why some non-scientists have suggested on the internet that my earlier article, dispelling this myth, is somehow a death-blow to evolution - it is not. My article on chimpanzees went one simple step further than the Science report. I took the amount of the chimp genome which has been aligned with the human genome (2.4 billion bases) and divided this by the size of the human genome (3.16 billion bases), to work out that only 76% of the human genome shows the 1.23% SNP and 3% indel differences (see above). Using these figures, and citing 2.7% copy number variation between the two species (Nature 437:88-93), I argued for a total similarity of around 70%.
This is a conservative estimate of what we can be quite sure is similar. Like all estimates, it makes assumptions. The key one here is that the parts of the chimpanzee genome that did not align to the human genome are different to the human genome. In general this is obviously true - only similar sequences can be lined up - but it is possible that the complex procedure by which the scientists aligned the two genomes may have caused some similar sequences not to be included. In addition, the 4% of the chimpanzee genome that has not yet been sequenced, or portions that have not been sequenced accurately, may also prove to have some similarity to the human genome. These could raise the overall similarity by a few percent, but I predict that when we have a reliable, complete chimpanzee genome, the overall similarity of the human genome will prove to be close to 70% (and very far from 99%).
OK, so that’s what I wrote in 2008. I will come back to what I now think needs to be corrected or updated, but to facilitate this, I think it is helpful to first ask:
(1) what does the data say today in 2018, and how can it be described to the public in an adequate manner?
When the man on the street asks me “How much of the human genome is the same as the chimpanzee genome?” I understand him to be asking: “how much of the entire human genome is exactly the same as the chimpanzee genome?” Or, in evolutionary terms, “how much of the human genome has passed unchanged to both chimps and humans”. He is therefore asking “what is the total one-to-one orthology between the human and chimpanzee genomes”. To answer his question with certainty, I would look to the latest reciprocal best whole genome alignment between humans and chimps. This is available at the UCSC website. If my stats of this alignment are correct (I am happy to share my Perl script if anyone wants to check it), it is 2,761,498,322 long, including 17,622,179 bases of indels and 35,820,144 variant bases. The human genome sequence used in the alignment was 3,209,286,105 long. Thus the total percentage of the human genome that I can know for sure has one-to-one orthology with the chimp genome is 84.4%. Therefore I would say to the man on the street: we know for sure that the human genome is 84.4% the same as the chimpanzee genome [note added 23/4/18 to avoid disambiguate: i.e. this is our minimum lower bound]. I would explain to him how I derived the figure. Regarding the regions that are not one-to-one orthologs, I would explain that about a third of it (a little under 5% of the whole genome) appears to be duplicated, with slight differences, in the human but not the chimp, and some of it is centromeric, and some of it is regions that have not yet been completely sequenced in the human genome, and some of it is genic, and some I have no idea what it is.
OK, maybe I should stop there for now, as this is becoming rather a long post. I would welcome your feedback on my answer to question one. At the weekend, I will try to come on to my other two questions: (2) what did the data say in 2008, when correctly interpreted? (3) Was I unreasonable in my interpretation of the 2005 Chimp paper in 2008?