Signal vs. Noise, Part 2: Hunter Opens the Klassen Study Again

Uh, Oh. Make that two head banging emojis.

That’s a bare assertion. You need to back up this claim with some statistics.

Therefore, 0.5 is well above random at 0.15. Looks like a statistically significant phylogenetic signal to me.

Good afternoon, Dr. Hunter,

Here’s what you say:

Here’s my best effort to understand what you say(^):

To me, these seem like logically equivalent statements.

I notice that the scientists in this thread besides yourself have either liked my post(s) or explicitly defended my interpretation of the data.

Moreover, I have been doing my best to make this a discussion about ideas.

I have not called anyone’s ideas “utterly bizarre.”

I have not labeled anyone else’s posts “lies, #### lies, and statistics.”

I have not cried out for “head banging emojis.”

I have done my best to avoid such snide rhetoric.

This assertion baffles me. According to Klassen, it means more than that. A CI above the CIrandom threshold indicates enough consistency with a phylogenetic structure to overcome the presumption of the null hypothesis, which is randomness. I quoted Klassen at length.

Perhaps you should quote the Klassen text I quoted, and tell us how Klassen got it wrong.

The data could even fit a model in which the moon is made of green cheese. However, the green cheese lunar model makes no predictions about the CI for a cladogram, so we do not use the data to argue that data fit the green cheese lunar model better than they fit the evolution model.

I put green cheese lunar model in italics to indicate that it stands for any model that makes no predictions about the CI value for cladograms with certain numbers of taxa. For green cheese lunar model, you could substitute:

  • general relativity
  • covalent bonds
  • protein folding
  • dependency graph
  • design model

As far as I can tell, the only model that predicts “phylogenetic information” in taxonomies is the theory of evolution. (The dependency graph model might be consistent with phylogeny or it might not; it depends on the topology of the DG. And DG is inherently a more complex model than evolution, as Ewert acknowledges.) This ability to predict phylogenetic information supports the idea that evolution is the best model science has to offer, until some other model comes along that makes superior predictions. If and when that day comes, the field of biology will be revolutionized once again.

I do model selection for a living, and this is not accurate. A model R^2 of 0.5 is terrible in the presence of a competitor with a R^2 of 0.95, but terrific when other models make no predictions at all and can’t even be included in the model selection process. Or when other models have a significantly lower R^2.

As I have mentioned several times, conventional biologists consider homoplasies to be noise–noise that is inevitable in a stochastic process. Likewise, the rate of change is noisy (“extremely rapid evolution or stasis”) because the earth’s habitats sometimes undergo rapid change, and sometimes remain in a stasis for eons. Unbelievable to you, but completely expected for those who model stochastic processes.

Winston Ewert disagrees with you. I quote from page 1 of his Bio-Complexity paper:

Life exhibits an approximate nested hierarchy pattern rather than forming an exact nested hierarchy. Even if the resemblance is weak or the approximation very loose, it is still undeniably present and thus must be explained. [emphasis in original]

Ewert states that the evidence is in fact reasonably consistent with evolution. Of course, he uses this as a point of embarkation. He does not question whether evolution is a reasonable fit, but instead introduces the possibility that another model might be an even better fit for the wide array of biological data.

To Ewert’s credit, he actually introduced a different model (dependency graph) and performed a Bayesian model selection between it, a random model, and a phylogenetic tree model on a small but interesting slice of data. Unfortunately, his initial effort suffered from significant problems that prevent the Bayesian analysis from hitting the mark. These problems are discussed elsewhere, both on this forum and on Peaceful Science. In theory, Ewert or others could perhaps solve these significant problems and introduce a good three-model Bayesian analysis into the peer-reviewed literature.

Until that day arrives, though, arguing that biologists should be ashamed of themselves for accepting noisy predictions is probably not going to succeed. Take away evolution and replace it with–what? Ewert agrees: criticizing evolution is not going to succeed. You need to provide a better model than evolution if you want to win the day.

Grace and peace,
Chris

^This statement is semantically identical to its pre-edit version, but lightly edited for clarity.

4 Likes

OK, here is your quote from Klassen:

You are continuing to twist the science.

Yes, and why is that?

Klassen isn’t the one who “got it wrong.” You are twisting the science, and then you want me to explain how the paper “got it wrong.”

You didn’t like the word “bizarre.” What word should I use to indicate that, in order to support your theory, you have pulled something out of thin air and attributed to a paper that says nothing of the sort? I’m trying to be polite, but you guys are not making sense.

The paper simply says nothing of the sort. You cannot infer phylogenetic structure. And, I’m sorry, but the paper isn’t the one that “got it wrong.” You’re criticizing me all along, while making, well, bizarre (there, I said it) claims. The paper states that data points beyond the random region indicate “phylogenetic information.” And what is “phylogenetic information”? It simply means you are outside that random region–there is some kind of non random signal. You guys are the ones who twisted this, and imported meaning that is not in the science, by attributing a nested hierarchy structure. For instance, you wrote this:

You guys are making things up whole cloth. Indeed, the paper is careful to guard explicitly against such unwarranted, unempirical, non scientific, claims:

So here are the scientific facts, regardless of what evolutionists say. The data, not one or two data points, but the predominant trend is that the data are closer to random than to CD. The data are not very probable on CD. We have always known there is a seemingly endless series of examples that violate CD. This paper is helpful because it provides a systematic view of this. The data, on the whole, violate CD. They do not fit CD, regardless of the evolutionary spin. As Josh Swamidass has admitted, you have to introduce additional mechanisms–namely, homoplasies. Designs appear independently, over, and over, and over, and over …

The fact that evolutionists have no problem with this is not a tribute to CD. This isn’t good news for CD. Along with homoplasies, there is ILS, gene conversion, duplication, drift, deletion, etc., etc. There is no empirical content here. CD can explain anything. Anyone not wedded to the theory can see the overfitting here. This is very obvious. It is unfalsifiable. The fact that evolutionists resist the science, and that this paper can, and has been, spun as confirming, beyond a reasonable doubt, evolution and CD, is extremely damning.

So back to your point:

So why do you think that is? Any wild guesses?

How about that? Based on your vigorous statements, I thought Klassen might have said:

"rejecting the null hypothesis of random permutation necessarily contradicts phylogenetic conclusions.”

…or even:

“CI values < 0.5 necessarily contradict phylogenetic conclusions.”

Whew! I was worried there for a while.

Like @Swamidass, I have also admitted that homoplasies must be introduced into the phylogenetic tree. Why not give me some love?

You do realize, btw, that @Swamidass liked the OP in this thread?

One hypothesis is that every pretty much every other biologist on the planet, including Winston Ewert, is wrong about nested hierarchy, and you are right.

The other hypothesis is that pretty much every other biologist on the planet, including Winston Ewert, is right, and you are wrong.

I will let readers choose.

Best wishes,
Chris

1 Like

While we’re asking each other for opinions about quotes, what are your thoughts about that quote from Ewert’s Bio-Complexity paper?

Life exhibits an approximate nested hierarchy pattern rather than forming an exact nested hierarchy. Even if the resemblance is weak or the approximation very loose, it is still undeniably present and thus must be explained. [emphasis in original]

1 Like

Ah, make that three emojis. And a bottle of aspirin.

Well think of it this way. If you look at the sky you’ll see that everything goes around the Earth. It is an undeniable pattern that must be explained. You could say the cosmos exhibits an approximate geocentric pattern. Copernicus published his heliocentrism which had many similarities to geocentrism. You move the Sun to (roughly) the center, but otherwise everything still travels in circles. The so-called “Copernican Revolution,” as it has been constructed, is largely a myth. The supposed cultural shift by moving Earth from the center is highly exaggerated. And Copernicus’ model wasn’t very good either. But it was an important moment in model improvement.

So regarding Ewert’s paper and that quote, it certainly is true that at a glance the species can give the impression of nested hierarchy. That was the idea from Aristotle to Linnaeus. So there is something there. The nested hierarchy model is simpler, and easier to conceptualize than the DG model. And the latter can be fitted into the former. That is, if you simulated a DG process, and constructed a biological world of genes and species, and then you fed the data into an evolutionary tree phylogenetic algorithm, it would return a reasonable tree. There would be the need for many additional mechanisms (homoplasies, divergences, etc.), but it would work. Simply put, a DG can be shoehorned into a CD model. This is what the paper was getting at. We can say that the species exhibit an approximate nested hierarchy, with the understanding that “approximate” here can mean a pretty lousy model, just like geocentrism is pretty lousy.

That is strikingly bold rhetoric, Dr. Hunter, considering that your “heliocentrism” can make almost no predictions about the orbits of planets, moons, asteroids, etc.

Here’s another metaphor that is apt for our discussion:

In the land of the blind, the one-eyed man is king.

The theory of evolution may have somewhat blurry vision, but it’s the only theory in the hypothesis space that is making any predictions with respect to the vast body of evidence.

Grace and peace,
Chris

The DG model has not been fitted to sequence data, as even Ewert explained. There is currently no DG explanation for the phylogenetic signal in sequence data. As noted by @glipsnort , the DG model can’t even predict the pattern of sequence differences between species with respect to transitions, transversions, and CpG mutations. The DG model also can’t explain orthologous ERVs, genetic equidistance, and the divergence of introns and exons. The common descent model can explain all of these pieces of sequence data, but the DG model does not.

Klassen’s statement is not elegantly phrased. Notwithstanding that, I submit with all due respect that you have misread Klassen here.

He does not say that you cannot use the 49 data sets to reject the null hypothesis.

Instead, what he is saying is that the rejection of the null hypothesis (on the one hand) does not necessarily imply being able to identify with confidence the single cladogram, within the permutation space of all possible cladograms, that best fits a particular data set (on the other hand). Single best fit cladograms, one per data set, are the “phylogenetic conclusions” to which Klassen is referring.

This is a common problem in conducting a search over an NP-hard problem domain. You can use some sort of hill-climbing to reach a local maximum, but once you get there, can you be 100% confident it is the global maximum? This is the problem Klassen is dealing with. He is not disputing the ability to identify the existence of a mountain range (a phylogeny) as opposed to the prairie land of randomness. Nor is he disputing the ability to find the highest peak within the portion of the range that has been searched. He is simply pointing out that the methods that existed in 1991, and perhaps even the methods that exist today, are not able to claim with 100% confidence that the best-fitting phylogeny that emerges from a necessarily constrained analysis is the best-fitting phylogeny for a domain under study.

Klassen refers to two other publications to support his conclusion:

  • FAITH, D. P., AND P. S. CRANSTON. 1991. Could a cladogram this short have arisen by chance? On permutation tests for cladistic structure. Cladistics 7:1-28.
  • FARRIS, J. S. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106:645-668.

Due to paywall constraints I have only been able to read the abstracts of these publications. The abstracts very clearly deal with the frustrations of finding a best fit cladogram, rather than with the question of whether the null hypothesis of randomness is overcome. The ability to reject the null hypothesis is not questioned in the least by the publications that Klassen refers to, at least as far as I can tell by reading the abstracts.

Thanks, and have a great southern California day.

Chris

You can’t use DNA (or protein) sequences. That’s the problem. By using sequence data, you are prefiltering to using only sequences that are present in all the species in the data set. Hence it isn’t accurate. It is a self-fulfilling prophecy. If you prefilter to only have data that support your theory, then of course, you’ll get very nicely behaved data. But if you are interested in realism, then you will look at the preponderance of the evidence. (I’m repeating myself from another thread, but you brought up this subject).

Hello Dr. Hunter,

I don’t understand this assertion. My understanding is that research studies use homologous sequences so that they can measure Levenshtein distances. Shorter distances are correlated with closer relationships, longer distances with more distant relationships. If you don’t use homologous sequences, however, then you have no way to measure distance.

As long as a sufficiently long sequence, or multiple sequences, is/are used, however, the null hypothesis is that multiple trees that emerge from the analysis of different segments should be no more similar than would occur by chance. If there is no historical basis of common descent, a tree m that is predicted only by happenstance from one segment is highly unlikely to be similar to tree n from another segment.

If all of the segments yield the same or very similar trees, however, then the concordance can become statistically significant evidence of a history of common descent.

For that matter, even if a single study yields a false positive due to a tiny sample size, the existence of other studies of different sequences for the same taxa would be quite unlikely to yield a similar tree if there were no historical basis in common descent.

Now if you could provide evidence that all of the phylogenetic studies are based on single, very short sequences, then I could understand your concern. Such an approach could indeed produce false phylogentic positives. However, I don’t think that you will be able to provide such evidence; all of the phylogenetic papers I have read from the past few years (admittedly, not that many) use multiple, long sequences in their analysis.

In addition to any citations you could provide, Dr. Hunter, I would also welcome input from others who are well-read in the literature such as @DennisVenema, @sfmatheson, @T_aquaticus, and @glipsnort,

Thanks,
Chris Falter

1 Like

While you are cogitating on this issue, Dr. Hunter, perhaps you could help us by describing the predictions a design model would make with regard to patterns (if any) in the Levenshtein distance of homologous DNA sequences in multiple taxa.

Thanks,
Chris Falter

1 Like

Evolutionists use homologous characters so they can measure distance.

Yes, precisely. That’s what I mean by “prefiltering.” An enormous wealth of data are filtered out. The methods themselves are theory-laden.

Let’s try again, Dr. Hunter.

If the design model cannot make any predictions regarding patterns (or lack thereof) in these studies of homologous sequences, then how are we supposed to do model selection in a scientific way?

Yes, every scientific research project is informed by theory. So I am not exactly sure what you are aiming at here.

I am going to go out on a limb and guess that it is your belief that heterologous sequences prove that evolution is inferior to some other unspecified theory which makes more accurate predictions about the simultaneous presence in a taxonomic analysis of both:

  • nested hierarchies in homologous sequences and
  • no hierarchy in heterologous sequences.

The theory of evolution accomplishes this through stochastic modeling that incorporates factors like incomplete lineage sorting, convergent evolution, copy-and-modification, etc.The theory of evolution actually has a model for the noise. But you think a model is already available which makes more robust, quantitatively accurate, and parsimonious predictions than evolution.

However, according to Winston Ewert, this unspecified theory cannot possibly be dependency graph analysis because DG makes no predictions as of today and likely for some time into the future with regard to sequential data.

If I have guessed incorrectly, kindly provide any corrections you think are suitable.

If I have guessed correctly, kindly name the competing theory and tell us what predictions it makes with regard to any patterns in sequence data, and how those predictions are derived.

Thanks,
Chris Falter

1 Like

Since this thread started with the Klassen 1991 metastudy, it’s worth pointing out that Klassen did not “prefilter” any of the traits or taxa from its Consistency Index analysis. Nevertheless, Klassen demonstrated an extremely strong signal for nested hierarchy. (Strong refers to the statistical significance of the signal.)

So no, support for nested hierarchy does not disappear in the absence of “prefiltering.”

Best,
Chris Falter

Why is that a problem? It’s kind of hard to sequence DNA that has been deleted in a lineage.

The theory of evolution predicts that phylogenies of sequenced orthologous DNA will recapitulate the phylogenies based on morphology. That is the prediction that is being tested. If the DG model does not make a prediction with respect to the differences and similarities in the sequence of orthologous DNA, then it is an inferior model to the theory of evolution.

HUH???

Why would the simple fact of sharing a DNA sequence guarantee that those shared sequences would recapitulate the phylogenies based on morphology? You need to explain this.

We already know from our own design programs that this isn’t true. We have inserted an exact copy of a jellyfish gene into mice which produces a DNA phylogeny that is completely different from the morphological phylogenies. Simply sharing DNA does not force a fit between phylogenies based on DNA and morphology.

1 Like

That’s because the theory of evolution makes predictions of what you will see in orthologous sequence. Therefore, you can use orthologous sequences to test the theory of evolution.

1 Like

You just answered your own question.

Please see:

If so, then evolution is false by modus tollens.

When a theory generates false predictions, it is not a very good theory.