Signal vs. Noise, Part 2: Hunter Opens the Klassen Study Again

OK, here is your quote from Klassen:

You are continuing to twist the science.

Yes, and why is that?

Klassen isn’t the one who “got it wrong.” You are twisting the science, and then you want me to explain how the paper “got it wrong.”

You didn’t like the word “bizarre.” What word should I use to indicate that, in order to support your theory, you have pulled something out of thin air and attributed to a paper that says nothing of the sort? I’m trying to be polite, but you guys are not making sense.

The paper simply says nothing of the sort. You cannot infer phylogenetic structure. And, I’m sorry, but the paper isn’t the one that “got it wrong.” You’re criticizing me all along, while making, well, bizarre (there, I said it) claims. The paper states that data points beyond the random region indicate “phylogenetic information.” And what is “phylogenetic information”? It simply means you are outside that random region–there is some kind of non random signal. You guys are the ones who twisted this, and imported meaning that is not in the science, by attributing a nested hierarchy structure. For instance, you wrote this:

You guys are making things up whole cloth. Indeed, the paper is careful to guard explicitly against such unwarranted, unempirical, non scientific, claims:

So here are the scientific facts, regardless of what evolutionists say. The data, not one or two data points, but the predominant trend is that the data are closer to random than to CD. The data are not very probable on CD. We have always known there is a seemingly endless series of examples that violate CD. This paper is helpful because it provides a systematic view of this. The data, on the whole, violate CD. They do not fit CD, regardless of the evolutionary spin. As Josh Swamidass has admitted, you have to introduce additional mechanisms–namely, homoplasies. Designs appear independently, over, and over, and over, and over …

The fact that evolutionists have no problem with this is not a tribute to CD. This isn’t good news for CD. Along with homoplasies, there is ILS, gene conversion, duplication, drift, deletion, etc., etc. There is no empirical content here. CD can explain anything. Anyone not wedded to the theory can see the overfitting here. This is very obvious. It is unfalsifiable. The fact that evolutionists resist the science, and that this paper can, and has been, spun as confirming, beyond a reasonable doubt, evolution and CD, is extremely damning.

So back to your point:

So why do you think that is? Any wild guesses?

How about that? Based on your vigorous statements, I thought Klassen might have said:

"rejecting the null hypothesis of random permutation necessarily contradicts phylogenetic conclusions.”

…or even:

“CI values < 0.5 necessarily contradict phylogenetic conclusions.”

Whew! I was worried there for a while.

Like @Swamidass, I have also admitted that homoplasies must be introduced into the phylogenetic tree. Why not give me some love?

You do realize, btw, that @Swamidass liked the OP in this thread?

One hypothesis is that every pretty much every other biologist on the planet, including Winston Ewert, is wrong about nested hierarchy, and you are right.

The other hypothesis is that pretty much every other biologist on the planet, including Winston Ewert, is right, and you are wrong.

I will let readers choose.

Best wishes,
Chris

1 Like

While we’re asking each other for opinions about quotes, what are your thoughts about that quote from Ewert’s Bio-Complexity paper?

Life exhibits an approximate nested hierarchy pattern rather than forming an exact nested hierarchy. Even if the resemblance is weak or the approximation very loose, it is still undeniably present and thus must be explained. [emphasis in original]

1 Like

Ah, make that three emojis. And a bottle of aspirin.

Well think of it this way. If you look at the sky you’ll see that everything goes around the Earth. It is an undeniable pattern that must be explained. You could say the cosmos exhibits an approximate geocentric pattern. Copernicus published his heliocentrism which had many similarities to geocentrism. You move the Sun to (roughly) the center, but otherwise everything still travels in circles. The so-called “Copernican Revolution,” as it has been constructed, is largely a myth. The supposed cultural shift by moving Earth from the center is highly exaggerated. And Copernicus’ model wasn’t very good either. But it was an important moment in model improvement.

So regarding Ewert’s paper and that quote, it certainly is true that at a glance the species can give the impression of nested hierarchy. That was the idea from Aristotle to Linnaeus. So there is something there. The nested hierarchy model is simpler, and easier to conceptualize than the DG model. And the latter can be fitted into the former. That is, if you simulated a DG process, and constructed a biological world of genes and species, and then you fed the data into an evolutionary tree phylogenetic algorithm, it would return a reasonable tree. There would be the need for many additional mechanisms (homoplasies, divergences, etc.), but it would work. Simply put, a DG can be shoehorned into a CD model. This is what the paper was getting at. We can say that the species exhibit an approximate nested hierarchy, with the understanding that “approximate” here can mean a pretty lousy model, just like geocentrism is pretty lousy.

That is strikingly bold rhetoric, Dr. Hunter, considering that your “heliocentrism” can make almost no predictions about the orbits of planets, moons, asteroids, etc.

Here’s another metaphor that is apt for our discussion:

In the land of the blind, the one-eyed man is king.

The theory of evolution may have somewhat blurry vision, but it’s the only theory in the hypothesis space that is making any predictions with respect to the vast body of evidence.

Grace and peace,
Chris

The DG model has not been fitted to sequence data, as even Ewert explained. There is currently no DG explanation for the phylogenetic signal in sequence data. As noted by @glipsnort , the DG model can’t even predict the pattern of sequence differences between species with respect to transitions, transversions, and CpG mutations. The DG model also can’t explain orthologous ERVs, genetic equidistance, and the divergence of introns and exons. The common descent model can explain all of these pieces of sequence data, but the DG model does not.

Klassen’s statement is not elegantly phrased. Notwithstanding that, I submit with all due respect that you have misread Klassen here.

He does not say that you cannot use the 49 data sets to reject the null hypothesis.

Instead, what he is saying is that the rejection of the null hypothesis (on the one hand) does not necessarily imply being able to identify with confidence the single cladogram, within the permutation space of all possible cladograms, that best fits a particular data set (on the other hand). Single best fit cladograms, one per data set, are the “phylogenetic conclusions” to which Klassen is referring.

This is a common problem in conducting a search over an NP-hard problem domain. You can use some sort of hill-climbing to reach a local maximum, but once you get there, can you be 100% confident it is the global maximum? This is the problem Klassen is dealing with. He is not disputing the ability to identify the existence of a mountain range (a phylogeny) as opposed to the prairie land of randomness. Nor is he disputing the ability to find the highest peak within the portion of the range that has been searched. He is simply pointing out that the methods that existed in 1991, and perhaps even the methods that exist today, are not able to claim with 100% confidence that the best-fitting phylogeny that emerges from a necessarily constrained analysis is the best-fitting phylogeny for a domain under study.

Klassen refers to two other publications to support his conclusion:

  • FAITH, D. P., AND P. S. CRANSTON. 1991. Could a cladogram this short have arisen by chance? On permutation tests for cladistic structure. Cladistics 7:1-28.
  • FARRIS, J. S. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106:645-668.

Due to paywall constraints I have only been able to read the abstracts of these publications. The abstracts very clearly deal with the frustrations of finding a best fit cladogram, rather than with the question of whether the null hypothesis of randomness is overcome. The ability to reject the null hypothesis is not questioned in the least by the publications that Klassen refers to, at least as far as I can tell by reading the abstracts.

Thanks, and have a great southern California day.

Chris

You can’t use DNA (or protein) sequences. That’s the problem. By using sequence data, you are prefiltering to using only sequences that are present in all the species in the data set. Hence it isn’t accurate. It is a self-fulfilling prophecy. If you prefilter to only have data that support your theory, then of course, you’ll get very nicely behaved data. But if you are interested in realism, then you will look at the preponderance of the evidence. (I’m repeating myself from another thread, but you brought up this subject).

Hello Dr. Hunter,

I don’t understand this assertion. My understanding is that research studies use homologous sequences so that they can measure Levenshtein distances. Shorter distances are correlated with closer relationships, longer distances with more distant relationships. If you don’t use homologous sequences, however, then you have no way to measure distance.

As long as a sufficiently long sequence, or multiple sequences, is/are used, however, the null hypothesis is that multiple trees that emerge from the analysis of different segments should be no more similar than would occur by chance. If there is no historical basis of common descent, a tree m that is predicted only by happenstance from one segment is highly unlikely to be similar to tree n from another segment.

If all of the segments yield the same or very similar trees, however, then the concordance can become statistically significant evidence of a history of common descent.

For that matter, even if a single study yields a false positive due to a tiny sample size, the existence of other studies of different sequences for the same taxa would be quite unlikely to yield a similar tree if there were no historical basis in common descent.

Now if you could provide evidence that all of the phylogenetic studies are based on single, very short sequences, then I could understand your concern. Such an approach could indeed produce false phylogentic positives. However, I don’t think that you will be able to provide such evidence; all of the phylogenetic papers I have read from the past few years (admittedly, not that many) use multiple, long sequences in their analysis.

In addition to any citations you could provide, Dr. Hunter, I would also welcome input from others who are well-read in the literature such as @DennisVenema, @sfmatheson, @T_aquaticus, and @glipsnort,

Thanks,
Chris Falter

1 Like

While you are cogitating on this issue, Dr. Hunter, perhaps you could help us by describing the predictions a design model would make with regard to patterns (if any) in the Levenshtein distance of homologous DNA sequences in multiple taxa.

Thanks,
Chris Falter

1 Like

Evolutionists use homologous characters so they can measure distance.

Yes, precisely. That’s what I mean by “prefiltering.” An enormous wealth of data are filtered out. The methods themselves are theory-laden.

Let’s try again, Dr. Hunter.

If the design model cannot make any predictions regarding patterns (or lack thereof) in these studies of homologous sequences, then how are we supposed to do model selection in a scientific way?

Yes, every scientific research project is informed by theory. So I am not exactly sure what you are aiming at here.

I am going to go out on a limb and guess that it is your belief that heterologous sequences prove that evolution is inferior to some other unspecified theory which makes more accurate predictions about the simultaneous presence in a taxonomic analysis of both:

  • nested hierarchies in homologous sequences and
  • no hierarchy in heterologous sequences.

The theory of evolution accomplishes this through stochastic modeling that incorporates factors like incomplete lineage sorting, convergent evolution, copy-and-modification, etc.The theory of evolution actually has a model for the noise. But you think a model is already available which makes more robust, quantitatively accurate, and parsimonious predictions than evolution.

However, according to Winston Ewert, this unspecified theory cannot possibly be dependency graph analysis because DG makes no predictions as of today and likely for some time into the future with regard to sequential data.

If I have guessed incorrectly, kindly provide any corrections you think are suitable.

If I have guessed correctly, kindly name the competing theory and tell us what predictions it makes with regard to any patterns in sequence data, and how those predictions are derived.

Thanks,
Chris Falter

1 Like

Since this thread started with the Klassen 1991 metastudy, it’s worth pointing out that Klassen did not “prefilter” any of the traits or taxa from its Consistency Index analysis. Nevertheless, Klassen demonstrated an extremely strong signal for nested hierarchy. (Strong refers to the statistical significance of the signal.)

So no, support for nested hierarchy does not disappear in the absence of “prefiltering.”

Best,
Chris Falter

Why is that a problem? It’s kind of hard to sequence DNA that has been deleted in a lineage.

The theory of evolution predicts that phylogenies of sequenced orthologous DNA will recapitulate the phylogenies based on morphology. That is the prediction that is being tested. If the DG model does not make a prediction with respect to the differences and similarities in the sequence of orthologous DNA, then it is an inferior model to the theory of evolution.

HUH???

Why would the simple fact of sharing a DNA sequence guarantee that those shared sequences would recapitulate the phylogenies based on morphology? You need to explain this.

We already know from our own design programs that this isn’t true. We have inserted an exact copy of a jellyfish gene into mice which produces a DNA phylogeny that is completely different from the morphological phylogenies. Simply sharing DNA does not force a fit between phylogenies based on DNA and morphology.

1 Like

That’s because the theory of evolution makes predictions of what you will see in orthologous sequence. Therefore, you can use orthologous sequences to test the theory of evolution.

1 Like

You just answered your own question.

Please see:

If so, then evolution is false by modus tollens.

When a theory generates false predictions, it is not a very good theory.

That study uses orthologous genes, so I’m not sure what you are getting at. How are you supposed to compare genes if a gene is not found in one of the species?

Since you can’t come up with a reason, other than common ancestry and vertical inheritance, why phylogenies based on DNA sequence would recapitulate the phylogenies based on morphology then I don’t see what objection you can have for using orthologous genes.

There is a statistically significant phylogenetic signal, so the predictions have been supported.

1 Like

Dr. Hunter,

Hope your southern California weekend is going well.

You chose to analogize the theory of evolution to geocentrism. I am going to choose a different analogy that I believe to be more accurate and informative: X-ray crystallography and DNA structure.

Just as evolution predicts a statistically significant nested hierarchy structure in a taxonomy, biochemistry predicts that DNA can take on structural forms known as A-DNA and B-DNA. The test of the hypothesis is the similarity of the predicted X-ray crystallography images to the actual. And here I introduce some predicted and actual images from an article on quora.com, “How does one physically interpret the different diffraction patterns between A-DNA and B-DNA?”:

Now it would be possible to build a consistency index for the predicted vs. actual similarity. The CI could answer the question: for each pixel that is dark in the actual image, is the corresponding predicted pixel dark? Sum up the number of pixels for which the correspondence is true and divide by the number of dark pixels in the actual image. This approach would be very similar to the CI approach adopted by Klassen et al. in 1991, except that they were analyzing characters instead of pixels.

Without access to the original data, I cannot provide an exact CI for the A-DNA and B-DNA images. However, there are clearly a lot more dark pixels in the actual image than in the predicted. I would guess the CI is roughly 0.5 for A-DNA and roughly 0.25 for B-DNA, which has enormous black blobs at the top and bottom where a thin segment of dots is predicted.

The question is: should the actual data be interpreted as evidence for the predicted structures of A-DNA and B-DNA?

Answer #1 is:

No, the actual pixels are poor evidence for the theory. Certainly there is some similarity between predicted and actual. But just as you have to introduce epicycles into geocentrism to account for planetary orbits, you have to introduce extraneous factors to account for the CI values, which are far below 1.0. To the extent that you consider the theory of DNA structure to be good, it’s only because you are prefiltering the badly predicted pixels. Consequently we should consider the theory of DNA structure to be not a very good theory

Answer #2 is:

Yes, the actual pixels are powerful evidence for the theory. The probability of the null hypothesis for the actual images (null hypothesis = random placement of pixels due to no structure) is infinitesimal–something like 0.0000000005. Therefore the alternative hypothesis, A-DNA and B-DNA, should be accepted.

The actual images do contain significant noise in but we have known mechanisms to account for the noise.

It is also possible that some other, as-yet unidentified hypothesis might be even more consistent with the actual images than the A-DNA and B-DNA hypotheses. If that as-yet unidentified hypothesis survives peer review, then we can adopt it. But until that as-yet unidentified hypothesis shows up, we accept the A-DNA and B-DNA theory with a high degree of confidence.

The biochemistry community has adopted Answer #2, not Answer #1. We should do likewise for the theory of evolution and the Klassen data, as well as for the more recent, genomic-based phylogenetic studies.

Best regards,
Chris Falter

2 Likes

Let me try again. This traces back to the question about the Ewert paper not using sequence data, but rather presence/absence data. I explained that the problem with sequence data is that in order to align and compare seequences, this means the gene must be present in both species. So by definition, you are filtering out cases where the one species has the gene, but the other species lacks that gene. This is a case where you have a big difference between two species, but it is not being counted, but rather filtered out.

Your response was to say that, well, we need to have the gene present in both species in order to perform a sequence comparison. Yes, agreed, that is true. I am not disagreeing with your point, I am pointing out that you simply are reinforcing the problem which I pointed out. The data are “theory-laden.” This prefiltering removes data comparisons which are highly improbable on the theory. IOW, they do harm to the theory. They lower the probability your theory is true. (speaking in Bayesian terms here, of course).