Continuing the discussion from Do Evolutionary Theory And Scripture Contradict Each Other?:
While we won’t know what the chimp genome really looks like until more accurate research is done, I recently did a study of the chimp reads that have lower levels of human DNA contamination, and in this newer study the chimp DNA is only 85% similar to human at best, not 98%.
There are many ways to do it. Some are better than others. The exact number isn’t important. The key thing is to lay the number alongside controls https://en.wikipedia.org/wiki/Scientific_control. It is the relationship between the computed similarity and the controls that allows us to interpret it.
Tompkins proposal is reasonable. He suggests measuring similarity between chimp “reads” (short fragments of raw data) and the human assembled genome, to eliminate bias introduced by using the human genome to scaffold the chimp genome. That is a good idea, clever really. Let’s use it.
So Tomkins takes chimp reads, and computes the average similarity between these reads and the reference genome. He computes about 85% (chimp read -> human genome). I agree that is what he got, and I get the same number as him. But how much is because of error in the reads? There are no controls, so we do not really know.
85% = (chimp read -> human genome)
How do we solve this? We add controls. Let’s try a few. We could add some more data to our analysis. Let’s say we look at the chimp genome too. Here is approximately what we get…
87% = (chimp read -> chimp genome)
Hmm. That isn’t right. No chimp is that different from their genome. We expect something closer to 100%. We can try another control. How about adding human reads, and seeing what we see there.
89% = (human read -> human genome)
87% = (human read -> chimp genome)
Hmm, so we see the same problem here. All humans are less than 0.5% different, so something clearly wrong is happening. What is going on?
It turns out this identifies a big problem, that would be obvious to those who sequence genomes. There is a lot of random error in reads (it is raw data after all). The error in the final genome is a lot lower, because the errors in individual reads cancels out. The error in the reads, however, is artificially lower similarity computed against the genome.
Is there a way to fix this? Yes there is! We can subtract out the amount of error in the data, by bringing chimp and human reads close to 100% when measured against their own genomes.
There are two ways to compute the similarity between humans and chimps this way…
98% = 100 - (human read -> human genome) + (human read -> chimp genome)
98% = 100 - (chimp read -> chimp genome) + (chimp read -> human genome)
We can do this over and over again, for every individual (human or chimp) that we have data. We have a lot of a lot like this, and the percent difference comes out, by Tompkins method, to be about 2% different or 98% the same, when you take the controls into account to subtract out the sequencing error.
But what does 2% or 98% mean any ways? How do we interpret that? Controls to the rescue. Let’s take mice and rats, animals most YECs think are of the same kind. “Microevolution” (to borrow their term) can account for the differences here. We can measure this the same way we measured the difference between human chimp. It is critical to measure the same way, so we can compare the numbers. We get approximately…
82% (mice - rats)
And that is compared with…
98% (human - chimp)
In evolutionary theory, there is mathematical theory that explains strange result. We can predict that there will be about 10x more differences between mice-rat than human-chimp (18% vs 2%), just as we see in the data. In the YEC world, this is clear evidence that humans and chimps genomes look like they are the same kind. Maybe God made us separate, but disproving evolution was not one of his design goals.
Of course, if you do not like my correction to Thompkin figures, you could always just look at the mice read to rat genome numbers. The uncorrected number is about…
70% = (mice read -> rat genome)
85% = (chimp read -> human genome)
Which is clearly below 85%, leaving us with the same interpretation. Humans and chimps are more similar than mice and rats. This is explained by the mathematical formulas of evolution, but is strange in YEC. At the very least, it tells us that God is not nearly as concerned about disprove evolution as we are.
NOTE: The numbers here are approximate, rounded off for clarity of text.