Hi Mervin,
I had intended to make a response to your comments but certain persons objected to my defence of Dr Jeffrey Tomkins in the issue of the similarity between chimp-human with the consequence that my account was ‘suspended’.
Therefore, you’ll understand that I was unable to post a response to you.
In the meantime I contacted Dr Jeffrey Tomkins and apprised him of the situation.
In his very recent email to me he very graciously explained where those who disagree with his findings of only an 85% similarity have erred in their findings.
He sent me the following lengthy article which is his latest research in which he drills down into the details and provides evidence for the similarity in chimp-human as being significantly less than that of scientists who are theistic evolutionists with an a priori commitment to Darwinian evolutionary processes.
Summary
In regard to data sets that included run date information, two different sets of chimpanzee DNA sequences related to the Sanger-style data sets used to construct the chimpanzee genome exist. The sequences that were produced early on in the chimpanzee genome project that contributed to the initial five-fold coverage of the chimpanzee draft genome (Mikkelsen et al. 2005), are significantly more similar to human than those that were produced later in the project by a difference of about 5% overall data set sequence identity. Contributing to this difference is the additional fact that a 5.6% difference in the amount of sequences that hit onto the human genome also exist.
When not considering run date, but instead including all sequences, two bins of data were constructed: data sets with overall identities below 90% and those above 90%. In doing this, the difference in sequence identity between the two data sets widened to 7%. This is largely due to the fact that the sequences lacking run date information were the most highly similar to human out of all the data sets. Because these data sets all contained filename numbers between 13 and 48, it is safe to assume that they contributed to the initial rough draft of the chimpanzee genome in 2005, inflating its human-like characteristics accordingly.
It may be that greater precautions towards human DNA contamination were taken later in the project producing less contamination. If the data from these seemingly less contaminated sets are considered, the chimpanzee genome is no more than about 85% similar to human. If all the data sets taken together are considered, despite the apparent human DNA contamination, then the chimpanzee genome is no more than about 90% similar to human.
It is very probable that the current chimpanzee genome assembly suffers from two major problems that make it more human-like that it should be. First, chimpanzee DNA sequences from both Sanger-style sequencing and next generation sequencing technologies, have been assembled using the human genome as a reference framework (Mikkelsen et al. 2005; Prado-Martinez et al. 2013). In other words, the chimpanzee genome does not stand on its own merits using its own framework-based genomic resources (e.g. an accurate integrated physical-genetic map for chimpanzee) as I described in an earlier publication (Tomkins 2011). Second, given the fact that significant levels of human DNA exist in non-primate databases due to laboratory and worker contamination (Longo et al. 2011), the potential for human DNA in the pre-assembled chimpanzee sequencing reads is highly probable and could be tested for by simply comparing the chimpanzee-human BLASTN analyses of the different data sets one to another. The main questions would be, are there significant differences between data sets, and are there any obvious patterns for these differences? The answer to both questions is a resounding yes.
In determining this, 101 Sanger-style publically available trace read data sets were downloaded, providing the longest possible trace read data source, were end-trimmed for low quality bases, and purged of contaminating plasmid cloning vector sequence. Then, 25,000 sequences were selected at random from each data set and queried against the human genome using BLASTN v2.2.31 with liberal gap extension. Results from the BLASTN analysis indicated that two different groups of chimpanzee DNA sequences existed. Those that were produced early in the chimpanzee genome project that contributed to the initial chimpanzee genome publication were considerably more similar to human than those that were produced later in the project by a difference of about 5%. It may be that greater measures towards alleviating human DNA contamination were performed as the project progressed. Data from the seemingly less contaminated sets indicate that the chimpanzee genome is no more than about 85% identical to human.
Furthermore, when chimpanzee sequences that did not hit onto the human genome were blasted against the chimpanzee assembly, the average alignment identity was only 85% when 99.9 to 100% identity should have been the result if the chimpanzee genome was accurately assembled.