Nested Clades, The Consistency Index, and Affirming the Consequent

First, it’s a bit of a strawman to suggest that high CI = high support of nested hierarchy. It’s nowhere near as simple as that. For a start, the CI was developed for (binary) morphological data, and it’s not employed on nucleotide phylogenies precisely because it’s not very informative about the quality of the tree.

The CI is sensitive to many different factors in the dataset, and one that appears to be very relevant in the DAG case is the number of gaps in the alignment necessitated by the massive variation in leaf sequence lengths. This produces a highly uneven distribution of character states, inflating the CI.
https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1558-5646.1989.tb02626.x
(note especially pages 4-5/15)

Entirely random DNA sequence alignments (with gaps) can also produce high CI, but consistently low RCI (rescaled consistency index).
https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1096-0031.1989.tb00573.x

In both cases, the alignments are obviously completely unlike any seen in rigorous phylogenetic analyses of homologous DNA sequences - i.e. unlike what we see in reality. Extremely low branch support values are consistently found in phylogenetic analyses of random or DAG-based sequences, unlike in analyses of homologous sequences. Mimicking one summary statistic (that is already known to be sensitive) is far from demonstrating that DAGs (or anything else) account for observed data better than an evolutionary process following a nested hierarchy.

4 Likes