ORFans, De Novo genes, and Taxonomically-Restricted Genes

@Paul_Nelson, one of the early voices of the ID movement has joined us with this post in another thread. I thought it warranted its own thread. It is a special privilege for us to have Paul here to discuss with us. Please treat him with respect, and carefully consider his ideas.

Having read the literature and looked at the data myself on this topic, I see no challenge to evolutionary theory in ORFans, De Novo genes, and Taxonomically-Restricted genes. To the contrary, they seem to be very strong evidence for common descent. But these ideas have become important to the ID movement nonetheless, and merit kind consideration. Even if some of us might disagree, this is wonderful opportunity to engage a kind leader in the ID movement.

So, what are your thoughts? What do you guys think bout this work?

@benkirk @vjtorley @Jay313 @Argon @glipsnort @Chris_Falter

1 Like

I would also add that I have had som very positive interactions with @Paul_Nelson. He is nice guy who cares about this stuff, and has always treated me with respect. I very much hope that we can reciprocate here. In particular, I really have appreciated his explanations of why he decides to accommodate theistic evolution (even though he himself is a YEC).

In short, humility on all sides is in order — but so is joyful confidence. What is the fastest way for any
design theorist to discover what’s wrong with his or her theory of origins and how that theory might be
improved? Talk to someone who shares the foundational design premise, but disagrees about the details. “

I would emphasize that I believe that God created us. He designed us all. So I should be part of that community that “agrees with foundational premise” but disagrees about the strength of this argument. So, I hope the ID people reading this do not dismiss my critique…

I see Nelson and Buggs referenced this paper, Mol Biol Evol. 2009 Mar;26(3):603-12, by Toll-Riera et al. That 2009 paper analyzed ORFans in primates, concluding that the majority were similar to pre-existing genes and/or had sequences related to transposable elements. They were also able to track back a number of ORFan sequences to non-coding regions in the genomes of other mammals.

A PLOS Genetics article from last December (Ruiz-Orera et al, Origins of De Novo Genes in Human and Chimpanzee. PLOS Genetics, Dec 2015.) describes efforts to identify human- & chimp-specific proteins expressed by new genes. This is covers one area I think Nelson & Briggs mention where they wanted to see more work. Ruiz-Orera et al. also tried to identify novel RNA transcripts specific to these species. About half of the novel transcripts mapped to regions of previously annotated genes (introns & exons). Most of the others mapped to regions between known genes. For the latter, there seemed to be insertions of promoter sequences that would drive transcription from those intragenic regions. They found, as others have noted previously, that many of these de novo proteins seem have undergone limited purifying selection. An additional trait for many of the new transcripts is the codon usage, which have not come the distributions observed for genes conserved across species. These new genes (perhaps not coding a viable product in most cases) and expressed proteins also tend to be short, and for the genes, often have lower numbers of exons. These traits are considered indicative of very recent origin from previously non-protein encoding sequences. Larry Moran provides some additional perspective here.

1 Like

Regarding bacteria:
One might expect more novel sequences in bacteria. Many bacteria are well known for ‘promiscuous’ DNA exchange. E. coli has sampled megabases of sequences over its existence.

PS - I’m going to be unavailable for the next week. Sorry.

In the book Scot McKnight and I have coming out in January, I discuss how Stephen Meyer handles ORFan genes in his book Darwin’s Doubt. Meyer thinks they come out of nowhere - completely missing the point that they arise from nearly-identical sequences that remain as non-genes in related organisms.


Nuts. Where are my manners, again… Congratulations on the paper, Paul!

1 Like

Also, I would add that the number of ORFans and de novo genes is a very high overestimate by the Nelson studies.

I think this article by Larry Moran is helpful and with good reasoning. Sandwalk: Origin of de novo genes in humans He references this this particular paper which is a must read (and already pointed out by @Argon) Origins of De Novo Genes in Human and Chimpanzee I would also add this study by Eric Lander as a must read: http://www.pnas.org/content/104/49/19428.full.

In spite of what you might have read in the popular literature, there are not a large number of newly formed genes in most species. Genes that appear to be unique to a single species are called “orphan” genes. When a genome is first sequenced there will always be a large number of potential orphan genes because the gene prediction software tilts toward false positives in order to minimize false negatives. Further investigation and annotation reduces the number of potential genes…

They compared the five genomes to find examples that were only expressed in humans and/or chimpanzees but where similar nontranscribed sequences were present in macaque or macaque and mouse genomes…

The result was 634 human-specific transcribed regions, 780 that were chimpanzee-specific, and 1,300 that were only found in humans and chimps.

We want to know if these are real genes of just spurious transcripts. The first clue is that 94% of these transcribed regions are expressed in testes. … You expect more spurious transcription in testes cells.

They found one human-specific peptide and 6 hominoid-specific peptides by mass spectrometry. By looking at ribosome-associated RNAs they identified 5 additional human-specific and 10 hominoid-specific transcripts. Thus, there are 21 potential de novo protein-coding genes. The median size of the peptides is 76 amino acid residues.

They conclude, “… in de novo genes in general there was not a significant decrease in the number of substitutions in the longest ORF when compared to neutrally evolving sequences, suggesting that the majority of these transcripts do not encode functional protein.”

“Our results indicate that the expression of new loci in the genome takes place at a very high rate and is probably mediated by random mutations that generate new active promoters. These newly expressed transcripts would form the substrate for the evolution of new genes with novel functions.”

This is important because it shows us that generation of new genes from “random” sequences is not difficult.

Scientifically, this study is entirely consistent with neutral theory, and (in my opinion) squashes the do novo protein argument against evolution entirely.

In particular, IDist that want to argue all these proteins are functional need to explain:

  1. Why are most of them in the Testis? Is that really the essence of what makes humans different then chimps?
  2. Why are all the new proteins so short?
  3. Why are all the new proteins similar to non-coding regions and known transposable elements?
  4. Why is there no evidence that most of them are translated into proteins? (this is the mass-spec portion of the study).
  5. Why does the length distribution of the ORFans exactly match what we expect from neutral theory (http://www.pnas.org/content/104/49/19428.full)?

To be clear, neutral theory make qualitative and quantitaive predictions about de novo genes that are validated entirely by this data. We need a very strong reason to reject this, and I just do not see it in @Paul_Nelson work.


Agreed, Josh - my thoughts exactly as well.

1 Like

Thanks for the plain English explanation!

Hi everyone. I’m back online, after three weeks without an Internet connection at home. (We’ve just moved house, and there was some red tape to get through before my new connection could be set up.)

I’ve had a quick look at Paul Nelson and Richard Buggs’ 2016 paper. On the positive side, it looks scholarly and well-referenced, and it makes a decent attempt to anticipate and refute objections to the thesis being advanced by the authors. I was interested to read about the large number of TRGs in bacteria, but also nematodes and insects. On the other hand, there appear to be gaps in the authors’ argument - notably the assumption that functionality is an all-or-nothing affair - and for human beings, at least, some of the literature they cite is a little old. Intriguingly, the authors mention a model developed by Carvunis et al. (2012) for the origin of de novo genes, which looks as if it could answer the authors’ difficulties.

Readers may be interested to know that I put up a post last year on Uncommon Descent titled, “Double debunking: Glenn Williamson on human-chimp DNA similarity and genes unique to human beings” (see Double debunking: Glenn Williamson on human-chimp DNA similarity and genes unique to human beings – Uncommon Descent ) which lays out the evidence that that 60 de novo protein coding genes said to be unique to human beings have very similar counterparts in apes. Writing that post was a turning point for me, and I’d like to publicly thank Glenn Williamson for his kind assistance. I concluded as follows: “It seems to me that any claims that humans have a large number of ‘de novo’ genes with no counterparts in the DNA of chimpanzees and other apes should be treated with extreme caution. In fact, I wouldn’t bet on our having any de novo protein-coding genes having no counterparts in apes…”

Does anyone know of any evidence that such genes exist in humans?


That’s true for functional genes as well.

The joke in mouse genetics in the late 1980s - early 90s was that every gene was expressed in brain and testis, and that had some deeper meaning.

It took many years before those in the field stopped highlighting brain and testis expression when they published the cloning of a new gene.

There are only a very small number of functional genes, and I would predict that the idea that they are unique to humans is probably an artifact of small data sets–IOW, that these genes were lost in the lineage being contrasted with humans.

Hello Vincent,
That’s an impressive post at UD. I’m impressed that you are open-minded enough to run the sequence analyses for yourself.

I wonder if you’d be willing to expand on comment 551 you made:

“Sorry, but you’re overlooking something here. The real question is whether an organism containing no other proteins apart from 50-amino-acid peptides that have some minimally selectable function would be a viable organism in the first place.”

My question for you is, against what would such an organism have to compete?

1 Like

My guess is that the DI and ENV board will still promote the idea that ORFans & etc. are a ‘conundrum’ for evolutionary theory for some time to come. Defunct ideas never die, they’re just recycled.

1 Like

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.