Academic persecution of ID proponents

T_aquaticus · September 21, 2020, 9:06pm

The rate of substitution shouldn’t look random if you are comparing functionally constrained and constrained regions. This is because of selection.

The chimp genome paper has a nice summary:

Single-nucleotide substitutions occur at a mean rate of 1.23% between copies of the human and chimpanzee genome, with 1.06% or less corresponding to fixed divergence between the species.

Regional variation in nucleotide substitution rates is conserved between the hominid and murid genomes, but rates in subtelomeric regions are disproportionately elevated in the hominids.

Substitutions at CpG dinucleotides, which constitute one-quarter of all observed substitutions, occur at more similar rates in male and female germ lines than non-CpG substitutions.

Insertion and deletion (indel) events are fewer in number than single-nucleotide substitutions, but result in ∼1.5% of the euchromatic sequence in each species being lineage-specific.

There are notable differences in the rate of transposable element insertions: short interspersed elements (SINEs) have been threefold more active in humans, whereas chimpanzees have acquired two new families of retroviral elements.

Orthologous proteins in human and chimpanzee are extremely similar, with ∼29% being identical and the typical orthologue differing by only two amino acids, one per lineage.

The normalized rates of amino-acid-altering substitutions in the hominid lineages are elevated relative to the murid lineages, but close to that seen for common human polymorphisms, implying that positive selection during hominid evolution accounts for a smaller fraction of protein divergence than suggested in some previous reports.

The substitution rate at silent sites in exons is lower than the rate at nearby intronic sites, consistent with weak purifying selection on silent sites in mammals.

Analysis of the pattern of human diversity relative to hominid divergence identifies several loci as potential candidates for strong selective sweeps in recent human history.
Initial sequence of the chimpanzee genome and comparison with the human genome | Nature

There are many types of questions one can ask about genome evolution, and this is why the numbers are treated separately. If my understanding is correct, the rate of substitution mutations is much easier to model than indels so you would want to know the number of SNP’s that separate the two genomes. If you want to know the total number of mutations (excluding recombination) then you need to know the number of indels, not the number of bases affected by the indels.

What do you find ambiguous?

EricMH · September 22, 2020, 12:05am

That’s an assumption.

It is also unclear how @glipsnort’s experiment is meant to demonstrate mutations are ‘random’ in the undirected sense that you mean. Sure, if biochemical distribution he derived is a sufficient statistic for all the mutations, then I can buy the mutations are undirected from what he presented. However, there is a very large amount of other patterning, especially the distance based correlation, that is inexplicable (to me!) from an unguided perspective, and is certainly not accounted for by the biochemical distribution.

For example, here is a histogram of the human/chimp comparison, which should be the most random, and it seems to be multi-modal, highly skewed, with a possible frequency distribution, definitely not a normal distribution. And if I run a test for normality on it, I get a very small p-value = 3.952e-16.

This histogram for human cow is even weirder, after I increased the number of bins to 600.

As a comparison, here is a histogram from sampling a normal distribution the same number of 8000 samples, which a test for normality gives p-value = 0.5557.

I guess you can chalk all patterning up to ‘natural selection’ but then what can’t natural selection do? It becomes yet another ‘of the gaps’ entity that we use to file anything weird under ‘misc’

And if we want to put constraints on natural selection, it can only select for what is immediately beneficial. If in order to get benefit we need multiple, coordinated, distant mutations to occur, it is very unclear how natural selection can help, since a very improbable event has to occur for natural selection to have any effect.

Here’s one section that pops out to me in the paper:

On the basis of this analysis, we estimate that the human and chimpanzee genomes each contain 40–45 Mb of species-specific euchromatic sequence, and the indel differences between the genomes thus total ∼90 Mb. This difference corresponds to ∼3% of both genomes and dwarfs the 1.23% difference resulting from nucleotide substitutions; this confirms and extends several recent studies63,64,65,66,67. Of course, the number of indel events is far fewer than the number of substitution events (∼5 million compared with ∼35 million, respectively).

This sounds bigger than the 1% number that is quoted, more like a total of 4% difference between chimp and human. And big insertions of new genetic material (if I understand correctly) seems very mysterious from an evolution point of view. I’m guessing the assumption is this inserted/deleted material was present in the ancestor, and different parts were deleted in humans and chimps.

Another thing, the way the genomes were compared it seems they were chopped into 1Mb pieces, and then pieces were compared. Doesn’t sound too bad, but it is less straightforward than if the genomes were lined up side by side and directly compared.

Additionally, the general use of edit distance type comparisons to find the SNPs is also a bit tricky. I can generate two random strings, and discard everything that doesn’t match as deletion/addition events, and the two strings will be considered perfect matches. Not quite what is done here, BLAST type algorithms try to find minimum edit alignments, but the discarding of portions that don’t match does add ambiguity. This ambiguity is absent from the popular caricature that scientists are lying both genomes down side by side and comparing them base pair by base pair.

Finally, the method of assembling the genome using a human reference is still yet another source of ambiguity. When sequencing, enormous numbers of reads are generated with some degree of error, and reads are not very long, just a few hundred base pairs long. With enough short erroneous reads you can match any string.

In general, with sequencing, assembly, and comparison there appear to be a sizeable number of opportunities for overfitting, so I think the human/chimp 1-2% comparison number needs to be taken with a generous pinch of salt.

glipsnort · September 22, 2020, 1:27am

No, that’s a very well known fact. Different substitutions have very different degrees of constraint as a consequence of the genetic code.

Since my blog post doesn’t contain the word ‘random’, it’s not clear to me what it is you’re wondering about. The question it attempted to answer was whether the differences between the human and chimpanzee genomes look like mutations. They do.

You need to label your axes with meaningful labels. I don’t know what you’re histogramming.

That’s not ambiguity – it just means you hadn’t read the primary source on the subject you’ve been discussing.

These are not big insertions. As the bit you just quoted make clear, we’re talking about 90 mb of different material spread over 5 million different insertions or deletions, so each one is on average 18 bp long. As for what they are – they’re described in the preceding two paragraphs.

Not for any comparison you’ve been talking about here.

Popular caricatures of science are seldom very accurate.

I suggest you should either (a) investigate the actual procedures used and determine their real strengths and weaknesses and provide an estimate of the error rate caused by these issues for the real genomic data in question, or (b), well no, there really is no alternative if you want to be able to do anything beyond contributing to caricatures.

I think a summary is in order here. A large team of very smart people spent years developing, testing, and refining state-of-the-art laboratory and computational methods to compare genomes. You’ve spent a few minutes skimming one paper that summarizes their work and based on that you think you know enough to effectively critique it. This is not a good look.

T_aquaticus · September 22, 2020, 3:14pm

Negative selection of deleterious mutations within functional DNA is not an assumption. It’s observed all the time.

Read @glipsnort’s article again. Please quote any section claiming mutations are either random or undirected. I don’t think you will find any.

The article is claiming that the processes producing mutations in genomes today was also responsible for the differences we see between the genomes of various species, such as human and chimps.

You are going to have to define what you mean by random and why these patterns should be random.

Natural selection can also select against what is immediately deleterious. Have you never heard of sequence conservation?

If you want to complain about how the popular press reports on science then you are going to have to get in line.

Indels are a real thing, so I’m not sure what you are getting at.

The quality of sequences is really high, so there won’t be this type of systematic error.

Sorry, but your opinion doesn’t hold much weight.

EricMH · September 26, 2020, 3:04pm

The line of reasoning as I understand it:

if (A) chimps and humans have common ancestry, then (B) their difference should look like it is due solely to mutation, we assume biconditional A ↔ B
( C) human/human difference follows a specific biochemical distribution, and (D) we believe human difference is solely due to mutation, we assume biconditional D ↔ C
(E) human/chimp difference also has the same biochemical distribution, therefore we conclude (F) human/chimp difference is solely due to mutation and thus (G) evidence of common ancestry

Unpacking #3: we observe E, and assume E = C, and infer from C that D, and assume D = B, therefore A. Lots of assumptions, but I’m willing to grant it all if the biochemical distribution was a sufficient statistic distinguishing humans and chimps, as well as other species. However, it seems to be anything but a sufficient distribution.

We can posit that ‘natural selection’ explains this difference, but we either make ‘natural selection’ an empty gap filler capable of all patterns we see, which I’m sure we all find unsatisfactory, or we place a constraint on ‘natural selection’ that it can only select according to what is immediately beneficial or not beneficial.

The specific pattern I see is that there is a strong correlation between distant nucleotides and the type of mutation that occurs. I can see how natural selection can stop disruption of distantly correlated nucleotides through the death of organisms that disrupt this correlation, but I don’t see at all how this explains the pattern I see. I am open to suggestions, even that the pattern I see is statistically spurious, although the test for normality suggests otherwise.

As for the 1-2% difference, I totally understand it seems very presumptuous for a lay person to critique the scientific expert of trained professionals conducted meticulously over many years. However, the stark contradiction between the claim on this thread the difference is actually possible even less than 1% and the actual paper stating the difference is at least 3%, greater than even the popular report, seems odd to me.

In addition, I see numerous areas where overfitting can occur, as I stated. Now, I can just trust the professionals controlled for this somehow, but I see no suggestion of that in the paper. Maybe I missed it.

At any rate, what I really trust is what I can verify for myself. Your suggestion of BLASTing the sequences myself sounds like just the thing. I’ve been trying to verify through NCBI BLAST, but so far no luck. Every query I try of human against chimp turns up with an error stating there is nothing similar enough to return. Probably something wrong I’m doing, like wrong database (I’m trying the refseq database) or parameters. I’ll keep trying and hope for success.

Finally, I really appreciate your and @T_aquaticus feedback. It’s been fascinating and educational to try and reproduce the evidence for evolution myself.

T_aquaticus · September 28, 2020, 5:29pm

It is also important to point out that further studies have looked at new mutations in humans, and they follow the same pattern with transitions outnumbering transversions and CpG mutations occurring at the highest rates.

Figure 6 | Correlation between observed de novo mutation rates and human/chimp substitution rates for mutation types in different trinucleotide contexts. De novo mutation rate spectrum (Y-axis) is plotted against substitution rate spectrum inferred from human vs chimp comparison (X-axis). Each dot represents a type of mutation in a specific trinucleotide context. The Pearson’s correlation coefficient r2 = 0.993. Figure from Francioli et al. (2015) (Supplemental Figure 6).

We observe a very strong correlation between the observed spectrum of human substitution mutations and the spectrum of SNP’s that separate the chimp and human genomes. The same biases observed in real time are the same biases we see between species. This is slam dunk evidence for the observed mechanisms being responsible for the differences between species.

It isn’t odd for people who work with DNA. There are many aspects to a genome comparison, and there isn’t one number that can answer all of our questions. Different types of comparisons are important.

For more recent studies they probably have the raw data available online if you want to assemble it yourself. If ID/creationists really have objections on how this data was analyzed then they can do the analysis themselves.

EricMH · September 28, 2020, 10:14pm

At the risk of sounding like a broken record, I fail to see how this substantiates either the mechanism for human mutation or the mechanism for inter species differences, precisely because there are highly patterned global differences between them that remain statistically and logically unexplained by the local biochemical distribution. It’s like saying knowing the laws of physics can explain the structured patterns of a car engine, or knowing machine code allows us to explain the structured patterns of a website. Sure, they operate by laws of physics and machine code, respectively, but knowing the laws of physics and machine code does not make one capable to be a mechanic or web designer.

The most glaring item in need of explanation, the global patterns, is completely unexplained by the biochemical distribution. The changes may proceed according to the biochemical distribution, just like the car and website operate by the lower level mechanics, but the cause is not distinguished by the distribution, just like the laws of physics and machine code do not by themselves give us car engines and websites. To then merely brush the global patterns under the rug of ‘natural selection’ is god of the gaps, pure and simple.

I don’t have any objections, just questions. But, I would prefer to use reference sequences assembled by the scientific community, so as to avoid objections to the effect there is some error in the assembly pipeline. For some reason this does not seem to be possible with the NCBI reference sequences for homo sapien and pan troglodyte, at least with the naive approach of BLASTing small parts of homo sapien chromosomes against pan troglodyte, which strikes me as quite odd. This would be a really easy way to verify the 2% difference claim, and it is strange to not be made more accessible. Trust but verify should be our approach.

T_aquaticus · September 28, 2020, 10:23pm

I am talking about observing mutations happening in the human population right now. They can sequence the genomes of parents and their children and find the mutations. In those experiments, we observe more transitions than transversions and CpG transitions occurring at the highest rate. That’s what the graph above is comparing, the observed substitution rates to the observed differences between the chimp and human genomes. They correlate.

If you want to talk about the physical distribution of mutations within the genome, we can do that too. However, I don’t see how that does anything to negate this evidence. Also, I don’t think it will go the way you are thinking. You may want to check out this blog for more info:

EricMH · September 29, 2020, 9:33pm

At this point I think we are talking past each other. I am not debating the biochemical distribution is the same for human ancestry and inter species comparisons. I agree, I’ve reproduced @glipsnort’s result myself, and do not find any obvious problems. Evograd’s article is the same result.

The point is the fact the local distributions are the same tell us nothing as to whether the mutations are guided, unguided, random or non random. It is like looking at the fundamental physical processes in a flowing stream and in a jet engine. Same fundamental physical procesess in both, but the fundamental physical processes do nothing to explain the very interesting difference in operation between streams and jet engines, nor whether the processes are intelligently orchestrated for a particular purpose or merely arranged by natural forces.

So, the fact that @glipsnort and evograd show the distribution is the same between human mutational differences and differences between species tells us nothing about the guided/random nature of those differences. Especially because there are very significant global patterns that are completely unexplained by the local distribution, and in addition are inexplicable by a natural selection that only selects for immediate consequences.

There is a huge gap in the explanation here, and neither random/unguided mutations, nor natural selection fills the gap.

MarkD · September 29, 2020, 10:45pm

I don’t think we should be disturbed by gaps. Over time, the picture gets filled in. Will it ever be completely filled in? If it were, would we recognize its completeness? Are we really entitled to gap-less explanations? I don’t see why. Yet so much more is known now than when I was a kid. In the 50’s who would have imagined we’d know as much as we do about the mechanical operation of a cell, let alone DNA. Maybe the glass is half full, not filled with gaps.

Klax · September 29, 2020, 10:50pm

It must be the God of Gaps then.

EricMH · September 30, 2020, 1:35am

Perhaps so, but the point is this biochemical distribution argument does not show mutations are random, nor are unguided.

Basically, I’m looking for some kind of strong evidence for evolution. I thought this biochemical distribution argument was strong, but then discovered these huge correlations, so with that the argument is disqualified, since whatever distinguishes the species is not explained by this distribution. The argument is that since the differences can be explained by a distribution that is known to be due to unguided mutations, then the same distribution in a different context must also be due to unguided mutations. However, if that is the case, then there should not be correlation between distant segments of the sequence. There is no way local biochemical mutations can give rise to distant correlations like this. And since natural selection selects for immediate advantage and disadvantage, it also cannot create distant correlations. There is no way this gap can be filled with evolution’s paradigm of unguided mutations and natural selection.

Sure, that does not preclude some gap filler discovered down the line, but whatever the filler is, it is not evolution. So, this biochemical distribution experiment by @glipsnort essentially disproves evolution.

MarkD · September 30, 2020, 2:19am

Do you have a suggestion for a better over arching explanation for the change in living organisms over time, why varying organisms have succeeded in every available niche over billions of years during which life has existed on this planet? As I understand it, evolution is the consensus best explanation for connecting the most dots. It isn’t settled fact (though parts of it are), just the current champ among contending theories. But it isn’t enough to say the general consensus best theory is inadequate unless you have something more adequate to offer. I’m pretty sure the volume, strength and popularity of criticism leveled against your alternative would make the fault you find with evolution look like strong praise by comparison. It might even make you feel as if you were being singled out by academia for persecution.

Do you suspect that the majority of scientists secretly agree with you, but hold back from joining you for fear of going against the herd? Or could it be that you’re missing something or possibly smuggling in assumptions that the rest don’t? It seems right to me for science to concentrate on explanation based on observable and measurable phenomena. If you are more interested in establishing that what can measured is concordant with the traditional creation beliefs of one religion than in seeing how far natural explanations can take you, it seems to me that you are engaged in a theological hybrid kind of activity and not science. If that is where your passion leads you, go for it. But you can’t seriously expect the entire course of modern science to revolve around any one set of traditional beliefs.

Klax · September 30, 2020, 5:52am

What theories contend with evolution?

MarkD · September 30, 2020, 10:27am

I guess any number of traditional creation stories. I think most cultures have one. Depending on how you hold the truth of the story, it can make accepting science seem disloyal. Yet what matters in traditional creation stories is probably not the empirical truth claims which sets science apart. So there needn’t be a conflict or any disloyalty and God has no reason to be jealous of our interest in science. Makes me think good theology is adaptive for the times we live in.

Klax · September 30, 2020, 10:28am

Aye Mark. But none of them is theoretical in any way.

MarkD · September 30, 2020, 10:33am

Agreed. Traditional stories are received, not speculative in any theoretical way … and certainly not tested.

T_aquaticus · September 30, 2020, 7:59pm

The evidence does support the conclusion that the same mechanisms producing mutations now were responsible for the mutations that separate the chimp and human lineages. It seems that we agree on this point.

If you want to discuss the evidence for random mutations with respect to fitness then I would suggest checking out my thread on the topic. The thread quickly went off on tangents, but the opening post should contain what you are looking for:

You jumped to a different topic. What @glipsnort and @evograd have pointed to is the evidence tying differences between species to the observed mechanisms that produce mutations in modern populations. The additional conclusion of randomness is supported by other experiments, as discussed in the thread I linked to above.

EricMH · October 1, 2020, 11:30pm

No, definitely not. I have no interest in proving any religion correct. What I am interested in is the truth, specifically the empirical scientific truth that I can test for myself with rigorous quantifiable techniques. I want to find this for evolution, and haven’t found it yet. “I still haven’t found what I’m looking for”

I’d assume most here are kindred spirits in this endeavour, but that appears not to be the case.

EricMH · October 1, 2020, 11:31pm

No, and I’ve explained why a couple times. I guess that’s all I have to say.