Adam, Eve, and human population genetics, part 10: addressing critics—Poythress, chimpanzees, and DNA identity (continued) | The BioLogos Forum

Substantial insertions and deletions often happen by mechanisms that are pretty well understood - non-allelic homologous recombination between repeated sequences, insertion of transposable elements, and others. It’s true that there’s no way to know for certain how many mutations occurred to produce the sequence differences between species we see today. Generally parsimony is the rule for analysis. Ideally you determine the ancestral sequence by looking at chimp, gorilla, orang, etc and taking the most parsimonious mechanism to get to the pattern seen today. I’ve seen articles by anti-evolution guys crowing about all the positions where human and gorilla have one nt and chimp another. The rather obvious conclusion is that most of these are mutations that are chimp-specific.

Different kinds of mutations have rates that vary a lot. Generally repeat copy number changes occur at much higher rates (events per generation) than point mutations. Changes in repeat copy number in microsatellites (short tandem repeats,) the stock in trade of genetic genealogy, are an example. A few larger repeats change copy number at such high rates that they will often distinguish monozygotic twins.

Small insertion/deletion events not involving repeats happen at ~10 times lower rate than point mutations. The larger an indel size you look at, the less frequent they are, unless they are repeat based. If you really wanted to be painstaking, you would look at the most likely event type to produce each sequence difference, take its estimated rate and add them up for all the events to estimate age of the common ancestor. But SNP rates are much better studied than the others, and there are far more SNPs, so that is the most straightforward way to do the calculation.

I just cooked up a way to calculate genetic distances for genealogy purposes that weights the copy number difference at each STR locus by the inverse of its measured mutation rate. When someone has changed copy number at several slow mutating STRs it makes a big difference in the results. STRs do weird things though - the only reason for messing with them is that that’s what most people have had measured. SNPs (single nucleotide changes) are much better behaved and more useful for estimating the time of the the most recent common patrilineal ancestor of living people.

It depends on what your aim is. If you are trying to calculate an age for a common ancestor, you want to know how many mutation events happened of each type and what is the rate for each type. If you want to calculate an overall percent difference for two genomes, you want to count every position that isn’t identical in an optimum alignment. If you want to look for the genetic basis of functional differences, you want every sequence difference in a functional element.

If you look at the differences between parents and child, there are far more point mutation differences than indels, but the total number of bases affected is greater for indels than point mutations.

As far as what percent of our genome has sequence-dependent function, some of the ID guys have made much of the ENCODE project’s results to claim that all or nearly all the genome is functional. The thing is, the ENCODE people redefined function down to merely mean that some biochemical assay would show something - histone modification, DNA methylation, transcription, whatever. The trouble is that chromatin modifying enzymes do their thing without any knowledge of function - if you injected a piece of bacterial DNA into a human nucleus, it would get a lot of these things done to it. The same for viral DNA.

The normal criteria for function would be that if it’s homozygously deleted or gets a transposon inserted in it or you replace it with random sequence of the same length, there is some deficiency of cellular or organismic function. Obviously, you can’t do this kind of engineering on humans to see what would happen. You have to either look at genetic disease patients or do it in mice.

The proxy is to look at conservation of sequence between species. You don’t learn much by comparing human to other apes - they’re too similar from recent common ancestry. So you compare more distant species. Of course the further away the species is, the more likely functional details, especially in non-coding sequences, have changed. This kind of comparison indicates that only about 5% of the genome is conserved (about 1.5% is protein coding.) There is no doubt some additional fraction that has primate-specific, ape-specific and human-specific function in elements that seem to change fairly rapidly like lncRNAs and some enhancers. The most generous estimates from people who have published on this are that maybe 15% of the genome could be functional. I wonder if the extra isn’t useful for allowing meiotic recombination to happen in non-functional sequences, since recombination seems to be locally mutagenic. Something I need to read on.

On the matter of genes arising form previously non-coding sequence, there was series of papers back in the 80s and 90s on some genes on bacterial plasmids that coded for enzymes that degrade fragments of nylon (a polyamide.) The bug was isolated from the effluent of a nylon manufacturing facility in Japan. One of these genes was remarkable in that 2 of the reading frames were open for hundreds of codons - one frame encoded the nylon hydrolase - the product of the other open reading frame was of unknown function. It was suggested at the time that the nylon hydrolase gene had arisen by a frame shift mutation since the invention of nylon. I thought it was an interesting possibility, but I did Blast searches of the protein sequences around 2000 and found that there were clearly homologous proteins in distantly related bacteria - it was not likely the thing had arisen so recently. I never saw that anyone came up with a good explanation why there should be 2 long overlapping open reading frames.

No, this isn’t correct. There was big fuss made by the media when the ENCODE project announced that they had found “functions” for a large proportion of our non-coding DNA (80%). The problem was that the way they were using the word “function” didn’t actually mean what most people understand by the word function. They simply meant that a large portion of our non-coding DNA is involved in “biochemical activity”. This was picked apart by other geneticists because “biochemical activity” doesn’t imply that these sequences are functional or necessary. See the controversy mentioned here. The encode consortium and the publications that hyped their results have since backtracked on their claims acknowledging that when they said “functional”, they didn’t actually mean functional.

There have been many papers published since explaining why most of our non-coding DNA must be non-functional (note that pretty much all geneticists recognise that at least some of it is functional) and most recently studies have calculated that less than 10% of our DNA is functional. About 1% is coding DNA, another 1% is non-coding but still lies within genes (introns) and about 6-8% is non-coding DNA that is functional. The remaining 90+% is non-functional non-coding DNA and unless it can be shown that we need this to bulk up our genomes, we could in theory survive without this.

Papers:

So the scientific consensus currently lies with the fact that most of our DNA serves no function.

I quizzed @DennisVenema on this 9 months ago and his views also lie with the scientific consensus.

So your suggestion is that it got deleted in a common ancestor and then got re-inserted in humans? This is still less likely than it being a non-functional sequence that was drifitng along, mutating at the standard background rate for non-conserved sequences before finally gaining a frameshift mutation in a human ancestor that created a long enough open reading frame for to to be transcribed at very low levels. The thing with open reading frames is that they are not difficult to create - all it takes is a chance frameshift mutation to create a long enough gap between a start codon and a stop codon.

I don’t think there is any evidence in the case of these 60 genes that they are valuable or necessary. Even though they are now being transcribed (albeit at very low levels), they probably still don’t do anything useful and they will probably disappear in a few thousand generations as they continue to mutate. It is worth noting that even though this paper was published years ago, these 60 genes still aren’t published in any databases that annotate genes (probably because there is no real evidence that they are functioning genes as opposed to spurious transcripts). What I do find amazing is that some novel genes are created this way (although this is rare). One paper calculated that only about 6% of the new genes present in primates came about this way. It starts with a spurious transcript (like one of these 60) a small fraction of these transcripts ineract with other functional genes, transcripts and proteinds (purely by chance), this very weak interaction is then refined over millions of years of natural selection until this new gene becomes an necessary cog in the machine that is us. There was a really interesting article on how these orphan genes come about in the new scientist in Jan 2013 here

Okay. I didn’t say that just because you’re a Christian (I’m a practising Christian myself but I’m agnostic on the supernatural claims of Christianity). I said that because it looked to me like you were looking for reasons to dismiss evidence. We are all subject to biases (myself included) and we need to be careful to allow the evidence to speak for itself instead of allowing our preferences to filter the evidences we accept and the evidences we don’t.

The link to your analysis of the GULO gene doesn’t seem to work for me. It’s colored like a link, but no address shows.

Hey PG :smile:

Thanks for pointing that out. I’ve fixed that now.

The point I was making was that the genetic evidence for when we diverged, roughly matches the archaeological evidence.

How do we date the divergence of two species from genetic evidence?

We count the number of differences between them (looking to establish the number of mutational events that must have occurred in total) and then use the computed mutation rate. Humans and chimps have roughly the same mutation rate - about 100 new mutations per generation - most of them being single nucleotide polymorphisms.

From this we get a figure of about 10 million years and we find that this roughly corresponds to the archaeological evidence. From this paper (Ardipithecus ramidus and the Paleobiology of Early Hominids). Quoting the author:

In effect, there is now no a priori reason to presume that human-chimpanzee split times are especially recent, and the fossil evidence is now fully compatible with older chimpanzee-human divergence dates [7 to 10 Ma (12, 69)] than those currently in vogue (70)

You write:

I feel you are also trying to minimize differences by saying that differences should be measured by mutation events, and not by base pairs differences.

If we are looking to establish when two species diverged, then counting mutation events is how it should be done. I think @PGarrison has done a good job of explaining this.

I moved 8 posts to an existing topic: The ‘car’ argument against Evolution

Pedantry alert. Homologous doesn’t really mean similarity above some cutoff. It means deriving from a common ancestor or ancestral sequence (for genes.) Homology is not quantitative like similarity or % identity. Two sequences are either homologous or they aren’t. Even molecular biologists use this incorrectly in their papers. For genes that exist in multiple copies in a genome, the copies are paralogous. Their sequences may diverge to varying degrees and they may have the same function, or they may have diverged to have related but somewhat different functions.

If a gene in two species serves the same function and each is the most closely related to the other of their class, they are orthologous. These distinctions can break down in complicated cases or it can be ambiguous which genes are orthologous, especially in very widely separated species, but it’s a good framework to start with.

It would seem to me to be mere guesswork, Ace, re 10my, based on what you are saying. Earlier you indicated that mutations could contain many bp, and now you say that most of the 100 new mutations per generation are single nucleotide polymorphisms. In addition, several mutations at one site could not be measured if it included significant deletions. A lucky guess is still just a guess, and it doesn’t verify anything; it could just be a coincidence. The methodology is suspect. This does not even include the probability that mutations rates are unlikely to be consistent over time. Which of course we cannot really measure beyond the 50,000 or 100,000 years.

Yes, that’s correct indels are rare (SNPs make up the vast majority of mutations) but when indels happen they usually affect a string base pairs at once (ranging from 5bp - 1000s of bp) as opposed to 1 or 2. I don’t know whether the 100 new mutations each generation include indels or not.

In addition, several mutations at one site could not be measured if it included significant deletions

Like I said, indels are rare but that counts insertions and deletions. How many mutations do you think are lost due to deletions? And what makes you think this would be a significant number?

It’s worth pointing out as well that deletions don’t affect the overall similarity between 2 sequences. Imagine two sequences that are 200bp long and there are a couple of mutations distributed randomly among them making them 98% similar. (i.e. they differ in 4 positions). Now randomly chop out 50 bases from one of the sequences. On average, that part you threw away will contain 1 SNP and 49 identical bases. Of the remaining 150 bases that can still be aligned, they will still be 98% similar since there will now be 3 SNPs over 150 bases.

You haven’t established what exactly is suspect or why it is suspect. In any case, tens of thousands of geneticists disagree. Just out of curiosity, how do you explain the fact that you seem to stand alone against the tens of thousands of geneticists who think the methodology they use is just fine? Do you think there is a vast conspiracy underfoot?

This is true to an extent. Mutation rates in a given sequence do differ between distantly related species. But we were talking about humans and chimps and humans and chimps both have very similar mutation rates so it seems most reasonable to assume that our common ancestor would also have had a similar mutation rate.

hi johnz. i also give the marbled lungfish example. this fish is different from human in about 98% of its genome.

so is whole genome phylogeny contradict the evolution prediction.

another problem is that some fossil are push back in about dozens milion years. so if their molecular clock first give a value of about 50 my. then it changes to a 100. or about 100% different!.

An indel may be rare at one time. But the cumulative effect might be greater than individual bp mutations, based on your numbers. 100 mutations, of which one might be 500 bp, while the rest are individual bp. Now, if a 500 bp insertion has a deletion within it, and then another insertion of a 50bp sequence at the same location, and six more individual bp insertions at the end of the previous sequence, all the indels and single bp changes will be confounded. If this has happened several times at one location, there would be little or no evidence of it, due to things being erased or deleted.

It may seem reasonable to assume a similar mutation rate between species, but that does not mean that mutation rates would remain constant over time.

At 4 mutations per year(100 per 25 years) per species, then 3.2 billion differences would happen in 400 million years, assuming mutation rates were constant for all species. However, this puts aside the randomness of mutations, which should lead to a certain percentage of bp becoming similar rather than different. It also ignores the statistics of multiple hidden mutations, where a mutation is erased without a record. This is a likely event to happen often over such a period of time, and yet is not part of the calculation. With the larger genome of the lungfish, if they are 98% (based on lungfish) different, then 130 billion bp would require 32 billion years, unless insertions added half the bp, which would reduce the time to 16 billion years. Again, this does not really account for deletions which cannot be detected. Over such a period of time, there would be many of those also.

John, the figure of 60-100 mutations per generation considers only point mutations. Small insertions and deletions (1 to a few bases) that are not changes in copy number of a repeated sequence occur 8-10 fold lower rates than point mutations (only a handful per generation.) (That’s the figure I remember offhand.) Changes in copy number of short repeated (microsatellites or STRs) sequences occur at much higher rates than point mutations (100-10,000x), but they only constitute a few percent of the genome. Generally the rate of occurrence for indels of a given size falls off rapidly with the size. On average, indels of all sizes combined affect more bases than point mutations, but ignoring the intervals where an indel has occurred still leaves the vast majority of the genome to compare point mutations. I don’t know if you have access to a library, but Evan Eichler had a good short review on types of mutations and rates in humans in 2013. It’s not free, but worth acquiring if this interests you. In a lot of libraries you can download the paper free if the library subscribes to it. Eichler is a good name to search on this kind of thing.

One thing to remember is that the recent one generation studies don’t account for what would happen to a mutation in the future. To affect a whole species a mutation has to be fixed - take over all or nearly all the chromosomes in the species. The vast majority of neutral mutations are lost within a short time from the population. Only a tiny proportion near 1/N, where N is the effective population size, will drift to fixation. For deleterious mutations (a small fraction of the total) only those that are weakly deleterious have a chance to get fixed. The rare adaptive mutation has a much better chance. When you are comparing species, you normally only want to compare fixed mutations, which means you have to have at least the local sequence in a number of individuals. We have that now for humans and chimps both, where a number of individuals (thousands for humans) have been sequenced.

I just noticed a new paper which is relevant to the previous paragraph. Characteristics of de novo structural changes in the human genome. Characteristics of de novo structural changes in the human genome
My library doesn’t subscribe to the journal, but from the abstract it appears that while large indels affect near the same number of nucleotides in fixed elements of the genome as point mutations, when de novo mutations are observed (in sequencing of families) they affect a much larger number of nucleotides than point mutations. The reason is that large indels are much more likely to be deleterious than point mutations, so they don’t last long in the population. This paper contains the first evidence of de novo transposon insertions which were not analyzed as the result of genetic disease. The Genome of the Netherlands project sequenced the complete genomes of 250 families.

1 Like

“It also ignores the statistics of multiple hidden mutations, where a mutation is erased without a record. This is a likely event to happen often over such a period of time, and yet is not part of the calculation.”

When geneticists are calculating something over a sufficient period of time for this to have an effect, they do include it. Geneticists are not dumb. They have been thinking about and publishing on these things for a long time. We do simplify things to try to make them comprehensible to laymen.

Studies have been done on the relative rates of all the possible point mutations along with the effects of sequence context (nearest neighbor.). A transition is a purine (A or G) -> purine change or a pyrimidine (C or T) to pyrimidine change. A transversion is a purine to pyrimidine change or the reverse. All this goes into models of neutral evolution (unaffected by selection, so mutation driven) as the null condition. In many cases the common chemical or enzymatic mechanisms that cause each kind of mutation have been worked out. The most common mutation is C->T in methylated C, which is a spontaneous deamination of cytosine, which occurs at several times the rate of any other point mutation.

John, if you really want to dig into the details, the literature is out there and much of it is free. There is a vast literature on mutation and DNA repair. Search PubMed for reviews on it and limit it to primates if you want.

I do share the impression that you (and dscccc) are inclined to keep coming up with an endless series of objections. That usually indicates that someone feels that some treasured belief is at stake and it is easier to keep coming up with objections than to consider whether the treasured belief should be modified. I have read personal accounts of several people who worked for decades in the anti-evolution cause before, unpredictably one day, they decided to reconsider their stance. Personally, I just remained uncommitted on the issue for a long time, but I knew I was going to have to sort through the evidence myself - it helped to be a biochemist - so I just did it on my own. It took me several years to do it sporadically in my spare time.

At this point I find the “discussions” more interesting as exercises in the psychology of belief than anything. I have no magic formula for convincing anyone. I have written a couple of blog posts attempting to explain what seems to me to be the strongest evidence on common descent and on the Out of Africa story, and Dennis has written on a number of kinds of evidence that I hadn’t thought of. Sometimes I am tempted to gather all the papers that would be relevant to human-primate common descent (I have hundreds on my hard drive) and write a series of blog posts absolutely beating this horse to a bloody pulp, but I know from experience that it wouldn’t do any good, so instead I play my guitar. :slight_smile:

1 Like

Helps keep the blood pressure down after attempting to reason with people committed to an idealogy on the internet :wink:

Hey John

If you refer to this paper which came out with the publishing of the Chimpanzee genome, you will note that about 1.5% of Human and Chimp nucleotide sequences are unique to each lineage. A sequence will be unique for one of two reasons:

  • Either it is new (duplicated from elsewhere, ERV insertions etc.)
  • Or it was deleted from the other species

If we assume that half of this is due to deletions (generous) then that means that at most 48 million bases have been lost due to deletions.

With a fixed divergence difference between humans and chimps of 1.06%, that means that around 508,000 SNPs have potentially not been counted. 508,000 SNPs that is out of a total of 34 million SNPs. Assuming geneticists haven’t already accounted for this then, at best their mutation counts would be out by about 1.5%

Given the size of the error bars in our current estimates for when humans and chimps diverged, why do you think this 1.5% would be significant?

I find the violin a good alternative for the same reason. :smile: You are right, it is difficult to reason with people whose ideology colors their facts.

Talk about ideology driven: (from the paper you linked) check the last sentence in this paragraph…

Hidden among the differences is a minority of functionally important changes that underlie the phenotypic differences between the two species. Our ability to distinguish such sites is currently quite limited, but the catalogue of human–chimpanzee differences opens this issue to systematic investigation for the first time. We would also hope that, in elaborating the few differences that separate the two species, we will increase pressure to save chimpanzees and other great apes in the wild.

This paper is nine years old… and it does not measure differences by bp, but by estimated mutation events. This is like saying that if you take the body of a van off a frame, and put on the body of a pickup truck, the difference is only 0.1%, because the entire body was put on in one operation, compared to the individual wheel nuts, wires, screws, tires, engine parts, exhaust, which were all put on individually. It also appears to ignore the genome size difference, making the assumption that is not a significant difference. I’ll have to read the paper later.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.