New Paper Demonstrates Superiority of Design Model

Complaints? Or observations and push-back? I might have missed a few things above, and I don’t doubt that IDers do put up with all sort of ad-hominem (which seems notoriously difficult for all parties to agree upon – one party’s “observations of facts” is another party’s “ad-hominem”.) So I don’t think that noting that a certain model or proposal hasn’t been published in any peer-reviewed journal (other than a specialized ID journal) is inherently ad-hominem, for example. But if you are being attacked personally or unfairly, then we moderators aren’t doing our jobs -which is quite possible, and we do need help. Feel free to point out objectionable content, and be specific about the exact sentences if you do. You are indisputably in “hostile territory” to your views, and so to be commended for your courage to stand up here for what you see to be true.

This conversation looks interesting to me as a non-specialist, and I hope to follow more exchanges to the best extent my limited knowledge allows. It also looks to me as if @cwhenderson finally gets what he had been waiting for: a proposed model from you (the “dependency graph model”), though it sounds like you still have some selling to do on that before others knowledgeable in the relevant fields would find it convincing enough to begin to engage in details. Carry on!

I suggest you all head over to the peaceful science forum. Winston is actually discussing the paper with swamidass and the conversation is a lot more pleasant than this.

1 Like

No, I wasn’t referring to this forum.

Any hints on how to find it? Do you have a URL?

I have only skimmed the paper, but have already seen that the author does not claim, and in fact disclaims, that the paper undermines common descent. He admits that the paper ignores multiple known facts of evolutionary genetics and natural history, but insists that this should not hinder consideration of his model. To some extent, I agree. Here is the text I refer to, from page 18 of the paper:

An obvious objection is that we have not included any of the mechanisms thought to account for nonhierarchical data such as incomplete lineage sorting, gene flow, convergent evolution, or horizontal gene transfer. As such, it might be argued that any of the features of the data interpreted as evidence for the dependency graph may also be explained by these mechanisms. The focus of this paper has not been to critique common descent, but to the test the predictions of the dependency graph hypothesis. The challenge to common descent lies not in the comparison of the tree and dependency graph models but in explaining the successful predictions of the dependency graph hypothesis.

What I found a lot less clear was the reasoning or data behind some of the specific scientific claims. Here is the one I am most interested in, from the beginning of the section “Small examples” on page 12:

We will now consider a few small examples of cases which our model-fitting method inferred to be explained by modules. A striking example can be found in Nematostella vectensis (scarlet sea anemone) and Branchiostoma floridae (Florida lancelet). These are distantly related organisms with the anemone being in the phylum Cnidaria and the lancelet being in the phylum Chordata. Nevertheless, they contain between 25 and 564 (depending on the database consulted) gene families found in both species but in no other metazoan species in the database. In all datasets where both species are present, a module is inferred to exist to explain the genes found in both species.

I strongly suspect that this claim is an uninteresting artifact of how the author is interpreting a database, and specifically how he is building conclusions from the fact that something is “not found” somewhere, but if there is any basis to it at all, it would be very interesting. Does anyone know, or can you tell, where he even got this? I’ll look harder too.


My understanding is that the theory of evolution is mathematically modeled as a stochastic process. Consequently, its predictions are probabilistic.

From a practical perspective, what does this mean? It might be helpful to look at another well-known, stochastic domain for insight. The domain I choose for this example is sports predictions (not that I wager money on anything personally).

Sports: Another Domain of the Survival of the Fittest

Let’s take a look at the 2018 World Cup. At the onset a month ago, some of the teams (Saudi Arabia, Morocco, Peru, Russia) seemed very weak and unlikely to advance even to the knock-out round. Others looked very strong and likely to go far into the knock-outs (France, Germany, Brazil, Belgium). How did these predictions fare?

They were mostly right, but there some notable exceptions. Tiny Croatia advanced all the way to the final, but mighty Germany did not even reach the knock-out stage. Huge exceptions!

Does these exceptions mean that Germany has a weaker team than Croatia? Almost certainly not; if they played a 20-game match over a 4 month period, I would expect (with near 100% certainty) Germany to prevail. Because the competition is not structured that way, exceptions are expected; outcomes can be predicted only probabilistically in a noisy, stochastic process such as a soccer tournament. (For our international readers, that’s football, not soccer.)

Do these exceptions mean that our basic premise–that stronger teams can be identified in advance and they can be expected to perform better–was wrong? Overall, no. Collectively, the stronger teams (France, Germany, Belgium, Brazil) performed far better than the weaker teams (Saudi Arabia, Morocco, Peru, Russia), in spite of the occasional contradictory outcome. The noise did not erase the signal.

Evaluating Models of Biological Origins

(1) Evolution

Evolution, as I mentioned, proposes a stochastic model. From the practical standpoint, this implies quite curiously that evolution predicts confounding observations–noise–just as sports prognosticators expect that occasionally Germany will fall early and Croatia will advance to the final. Biologists have even identified some of the confounding factors in evolution: for example, convergent evolution in phylogenetic trees, and incomplete lineage sorting in genomic-built trees.

At the same time, evolution predicts that that the forces of drift, mutation, flow, recombination, and natural selection will result in

  • transitional fossils;
  • the appearance of more homologous endogenous retroviruses in populations as they are situated more closely on a nested hierarchy;
  • adaptations, exaptations and vestigial structures will appear throughout the domain of biology;
  • pseudo-genes throughout the domain of biology,
  • homologous pseudo-genes being more common among populations that are closer in a nested hierarchy; and so forth.

And indeed these predictions are borne out in observation.

(2) Design Model

I am not able to glean any predictions from Ewert’s paper, since he specifically rules out the validity of using his paper to draw comparisons between evolution and dependency graph models:

The focus of this paper has not been to critique common descent, but to the test the predictions of the . dependency graph hypothesis. (p.18)

In my next post, on the shortcomings of Ewert’s methods, we will see why Ewert recognized that his paper could not be used the way that ID proponents would wish to use it.

This does not prevent our friend @Cornelius_Hunter from critiquing common descent on the basis of this paper.

What I would like to see from the mild-mannered Biola professor (who leaps tall buildings in a single bound?) is what predictions a design model would make about data other than component-based vs. tree-based modeling. What would a design model predict with regard to:

  • Homologous ERVs - Should ERVs be found in homologous or orthologous locations under ID? Should the ratio of homologous to orthologous ERVs vary based on the taxonomic distance of species?
  • Vestigial structures such as the sightless lenses of marsupial moles where eyes are expected?
  • Pseudo-genes (i.e., vestigial genes) such as the vitellogenin and vitamin C genes in H. sapiens.
  • Noise. Dr. Hunter complains incessantly when conventional biologists attribute confounding observations to noise that nevertheless does not erase the evolutionary signal. Given robust predictions by a design model, would the design model nevertheless predict noise in real-world observations ? If so, why?

Dr. Hunter, I would appreciate very much hearing your thoughts on these questions.

Until we can get these predictions formulated, there is no way to compare evolution with a design model. Suppose for the sake of argument I were to grant Dr. Hunter’s assertion that, at the gene family level, dependency graph is 104 bits superior to a tree model. Given predictions by a design model for other classes of data such as ERVs, vestigial structures, and vestigial genes, we could still discover that the tree model is 1010 bits superior to a dependency graph. This would overwhelm the gene family model-building data.

In the absence of any predictions by a design model with regard to these classes of data, though, it is impossible to make any reasonable comparisons between a design model and evolution, other than to say that a design model is better at predicting one class of data, but evolution is infinitely better at predicting many classes of data.

If this is the choice that scientists must make, I suspect the vast majority would prefer a model that predicts many classes of data well over a model that predicts one class really well but is unable to make any other predictions.

Chris Falter


Peaceful Science.

Please observe Joshua’s Ts&Cs, guys - we’d like to keep the conversation more pleasant than this, as per @T.j_Runyon


Can you perhaps explain to me what a module is that gives genes to these species that is not any kind of common ancestor?

Well perhaps it is not interesting, but what the paper is pointing out in this section are examples that don’t fit common descent, and the dependency graph models as “modules.” Figure 8 gives a nice illustrtation, though it takes awhile to soak it in.

Dont you find that odd? Especially considering that the HomoloGene data set gave the best performance for the tree model. A different dataset (Table 4 in Results) gave the result as >500,000 (vs. 10,000 for Homologene). I have no idea what the anti log of that might be, but the enormity of these probability differences worries me a bit. Not to mention the disparity between the data sets, which calls to mind @glipsnort’s comment about missing data in these data sets. So I am curious about the use of other statistical methods comparing these models. Do you know if Ewert tried other statistical tests using p values or confidence levels, before going to the Bayesian method? And what those other tests might have shown? I would not be surprised if they showed some improvement of the DG method over the tree, but I would be surprised if t statistics were in the range of billions in difference.

The reason I wouldnt be surprised, and my main concern about the approach (but I might have missed something since I only read the paper once) is it seems to be comparing the DG model with an evolutionary tree model that ignores convergence and gene flow, by focussing on absence or presence of gene families, and not functional ontologies or gene sequence differences (as @T_aquaticus repeatedly pointed out).

Finally please not that Occam;s razor is not a scientific law of nature and is completely useless in biology, so it isnt surprising that evolutionary theory violates it continuously. So, after all, does basic biochemistry.


Well this paper does not explain the mechanism. Just as geocentrism, heliocentrism, common descent, etc., are models which, by themselves, are not explaining mechanism, but rather are describing relationships.

I’m not sure I understand the whole question, but here’s my best shot.

I think “module” is a very rough concept he is trying to develop, in which gene families are analogous to objects in OOP (or subroutines for older folks like me). If I’m reading him right, he’s trying to treat gene families as free-floating modules that can be employed in combinations the way objects in OOP or parts in manufacturing can be. That part of the reasoning is basic and unoriginal. For it to have any explanatory potential, it would have to also correlate with function (my opinion, anyway). So for example, if “modules” aren’t well explained by common descent, they ought to be explainable by common function. Otherwise, the whole module rationale is ad hoc.

But then you ask, I think, not sure: “…what…gives genes to these species…” and the answer is a “designer.” (End of page 3.) I assume this is Alanis Morissette, but that’s the deep thinky stuff of theology that is way above my head.


I’m not quite following the concern here. The fact the different datesets give different results would be expected. You wouldn’t expect these results to be similar, because they are based on quite different data.

Well the large numbers, in themselves, are not too surprising. You’ll get those if one model is superior, when you have a large data set. Several folks have expressed this concern with the large numbers. But that is what you will get if one model is clearly better. Regarding other methods, I agree, it would be interesting to try them out, for sure. I have no idea what Ewert attempted in his work.

Well, interesting point; however, the concept of parsimony is really crucial in data analytics. And too often analysts don’t pay attention to it. That’s why there are good model selection methods available these days.

I will say, however, that this isn’t going to make much of a difference. The Bayesian model selection isn’t going to give big differences from, say, AIC.

Ewert has actually made an appearance at Peaceful Science. If you’d like to discuss the paper with the author directly, this is probably the best opportunity to do so.

1 Like

And Ewert has been very gracious. So for those that do come on over to peaceful science please return the favor.


It will be interesting to see the guidance and predictions that the DG model can provide. One obvious one is that the DG model produces genetic modules. This is a completely new modeling element. The DG model shows how these modules support (feed into) the species. So this is going to raise a lot interesting questions about how the genetic modules are related. What groupings or other patterns, of the genetic modules do we see, and what do these groupings tell us about their design, function, and roles in molecular and cellular biology?

Here’s another one. I’d be interested in seeing the DG model for microRNA genes. Given how these newly-discovered (relatively new) genes have contradicted CD, I can’t help but wonder if they turn out to have an interesting DG pattern. Just a thought, but it seems interesting to me.

The bottom line is, these genetic modules that DG produces are a new construct. They fall out of constructing the DG diagram. So what meaning, if any, do they have? I suspect it will be pretty interesting, and whole new ways of thinking will be opened up. But who knows?

I agree with you about the gene modules, and I was thinking they might be very interesting if applied to some examples of conversion in the evolutionary framework.

1 Like