BTW, I wanted to get back to the general point, made several times, by several people, that the CD model in the paper is inadequate because it fails to include the various additional mechanisms (homoplasy, ILS, duplication, deletion, etc.). Some have stated that the test presented in the paper is therefore invalid. I wanted to address this, because there are two important points that need to be understood about this.
First, this goes both ways. That is, additional mechanisms can be applied to DG as well.
Second, the additional mechanisms will not be for free. They will penalize CD, and this penalty is very important in model selection. It has to be, and this has been borne out in real data analytics. I have seen this myself. Modeling terms that I thought were important and legitimate were thrown out in the model selection process. This is important. If you do not do this, you will end up with a model that fails in its predictions. It looks great at the training stage, but fails when used with new data. This is a real problem, well understood by data analysts.
You may say, “well, tough, that’s the way biology is.” Well fine, but if so, you have a real uphill battle. For you are up against a model which has an enormous head start. Once you begin to add those add-on mechanisms, you will incur cost. And also, add-on mechanisms would be available to DG as well.
I’m not saying this is an impossible task. Perhaps CD can somehow be shown to be better than DG, but that appears quite unlikely.