Hi Richard,
Yes, life gets busy. Ditto here.
I’ll be honest that I see this discussion as providing ever-diminishing returns. If you want to see your ideas get traction, it’s time for you to do some modelling. You also need to deal with the strongest evidence available, not simply pick at what you see as lesser evidence. For example, there are regions of the human genome that do have haplotypes that could fit into two people. Those are not the areas you need to deal with (!). The most variable ones are the issue.
For that reason it’s unclear to me why you want to continue picking at Zhao when there is stronger evidence to deal with, such as the other papers I’ve pointed you to. For example, the Alu paper, that paper with the haplotype blocks on chromosome 21, and so on. Even if you could shoehorn the data into 4 blocks for Zhao - and even that is not reasonable, as I will discuss briefly below - you just can’t shoehorn the variation on chromosome 21 into four blocks.
If you were my colleague approaching me with this as a hypothesis to be tested, I would immediately point you to the challenges - i.e. the haplotype diversity we see across the genome - and tell you that unless you have a reasonable explanation / mechanism for producing that diversity in the timeframe you propose that this is a waste of time. So would any other geneticist.
I’ll include a brief discussion of the issues here for the benefit of others, since this sort of thing also applies to any other part of the genome, including those haplotype blocks on chromosome 21.
For Zhao, you have in your second haplotype grouping more variation than reasonably can be attributed to one starting haplotype. For example, three individuals (3, 4 and 5) that share an “A” at the position in column six, and a “C” at the position in column nineteen. That looks like another haplotype to me. At position 15, you have about half in group 2 with either allele (A orT). Again, it would be more reasonable to have these as separate types. Ditto for that same column in your group 3. These are all types that have several people in the data set - these are common types.
If you try to start with four ancestral types and then produce the variant types within each of your groupings, you’ll see that you need to invoke too many rare events. For example, in order to produce individuals 3, 4, and 5 in group 2 from one of the other types, here’s what would have to happen:
We would need a mutation to an A at position six, followed by drift to make this variant more frequent. Then, one of the variants later would have to mutate to a C at position nineteen, and once again drift would have to occur to make the new variant more common. These two events could be reversed, of course - mutation to C19, drift, and then mutation to 6A, and drift. Once we establish the 6A / C19 variant, we need either (a) a third mutation at 15 to give 15T for some of the 6A / C19 descendants, or (b) a double crossover event in this region to pick up 15T from another type. This would be needed to explain the fifth individual in your group 2. Then, after these events, you would again have to have drift occur to make these new combinations reasonably frequent such that they would be picked up in Zhao’s sample size. That’s a large number of very rare events, and at least three instances where drift has time to work to take new variants to a reasonable frequency.
You don’t have enough time in your model to make this work. Rare events take a long time to appear, and then drift has to act between each rare event, and that takes a long time too. Multiple rare events interspersed with long times for drift = too much time.
It’s this sort of thing that would be even more of a problem for the chromosome 21 paper, to say nothing of looking at the scope of haplotype blocks across the entire genome as catalogued by the 1000 genomes database.
It’s just not going to squeeze into 4 (or two, which is what it should really be, if Eve was a clone of Adam). If you disagree, feel free to model it and present it for peer review - even the informal peer review that would result from discussing your model here.