How to account for the difference in gene numbers between chimps and humans?

Lynn_Munter · December 27, 2017, 4:26am

Sorry I have given the impression of stalking. I like reading a lot of threads and I comment when I have something to say. Thanks for expanding on your position here.

Mark_Moore · December 27, 2017, 12:50pm

Thank you Lynn. I must admit that George and I are not tracking. To me his answers are unresponsive to what I am asking him and some things he says indicate to me that we don’t have that much differences to fight about- BUT I have had so much trouble making myself clear on this board that it could be me, not him. Also, I seem to have picked up a nasty stomach bug so I can only hope that my post to glipslop is marginally coherent considering the condition I am in. I think I will just wait for him to deconstruct…

Bill_II · December 27, 2017, 3:03pm

@Mark_Moore
If I might inject a comment here. You are approaching the genetics from the viewpoint of the individuals involved. Science approaches genetics from the viewpoint of the populations involved. While how a gene would be transmitted needs to be understood in terms of individuals the math is done on populations. So for your new gene question the first part is there is a probability of 0.25 that the gene will be passed on to a child (you can only get it from one parent). This doesn’t mean that if there are 4 children only 1 will have the gene. It could be 0, 1, 2, 3, or 4 of the children have the gene. The other part of this is after a few generations if two people with the gene have children the probability goes up to 0.75 that the gene will be passed on (you can get it from either parent). After a few more generations if a couple where each person received the gene from both parents have children then the probability that their children will have the gene is 1.0. Does this make understanding how a gene spreads through a population a little easier?

I hope this is correct as I am not a scientist but I do like to act like one on the internet.

Mark_Moore · December 27, 2017, 4:57pm

Bill I did start out like that but once glipsnort said it was better to look from the population down rather than the first individual up I came back with a post that looked at it that way- and I have pointed out what I think are troubling numbers. It doesn’t matter whether I am showing how long it takes one new gene to “earn” its way into the population in an idealized scenario where no luck is involved or we view it from the top down and crunch they numbers based on how many “lottery winner” genes there are. See the post I put up last night. The kind of long one beneath my first long one.

sfmatheson · December 27, 2017, 5:07pm

Hi Mark, I just read that post and it seems to be a lot of speculation and a fair amount of error, with no references to the large and dynamic (i.e., ongoing) literature on topics of mutation, duplication, genomics, and evolution. I think that unless you begin to read and interact with the published science, and the knowledge already gained by thousands of smart and hard-working scientists, it will not be very productive to discuss your questions and ideas. In short, you are writing as though you have arrived at new insights that somehow an entire scientific discipline, which has published hundreds of articles, missed. You can be assured that this is not the case.

sfmatheson · December 27, 2017, 5:14pm

“Almost automatically?” Please. Not even close. The “ring species” phenomenon is rare and is only useful as an illustration of mundane evolutionary processes.

Mark_Moore · December 27, 2017, 5:17pm

These topics are inherently speculative.

Point my errors out. I can take it!

I gave a link to such literature within the body of the post. [quote=“sfmatheson, post:26, topic:37511”]
I think that unless you begin to read and interact with the published science, and the knowledge already gained by thousands of smart and hard-working scientists
[/quote]

Again, I have and I linked to some of it. In addition to the fact that I was a public school science teacher for twelve years.[quote=“sfmatheson, post:26, topic:37511”]
In short, you are writing as though you have arrived at new insights that somehow an entire scientific discipline, which has published hundreds of articles, missed.
[/quote]

That is how all scientific progress is made of course. But yes, I think science has been captured by philosophical naturalism and as such there are some questions they just don’t want to ask. In that environment there is a place for someone from outside to look at the results that they have found and suggest it points to something that they are not willing to look at.

Then don’t. But then don’t try to tell me I’m wrong after your refusal to engage. I’ll wait for glipsnort’s feedback.

sfmatheson · December 27, 2017, 6:05pm

There are no links to any published papers in your post. It contains two links: one to Steve’s blog post, and one to a press release from 8 years ago. It is clear that you are not reading the scientific literature.

Here’s just one. You wrote, “Substitution errors in existing alleles are very common. Gene duplication errors are much less common though not rare.” I don’t know why you wrote that; maybe you read it somewhere. But it’s wrong. Copy-number variation is very common, and in fact accounts for most genetic diversity in humans. You can read about it in dozens of published papers, but here’s a good one to start with:

Do you have a reference for these claims? I know that there are reports of hundreds of de novo genes in humans, but very few of these have been linked to any function at all. I have been writing about this very interesting topic on my blog, so I know the literature on new gene formation very well.[quote=“Mark_Moore, post:7, topic:37511”]
Such genes actually achieving fixation are of course rarer still- they are the lottery winners. NOT the winners of a “new allele on existing genes lottery”, but the much less common “lottery of new functional gene locations”.
[/quote]

What is a “new functional gene location”? Since you were talking about duplications, isn’t it true that duplications are in “functional locations” almost by definition?[quote=“Mark_Moore, post:7, topic:37511”]
Thus the 4N * generation time you gave as a formula for average fixation time for the winners of the lottery has a severe hurdle to meet before it even becomes relevant to our calculations -it does not consider the problem of how long it takes to accumulate lottery “players” in numbers sufficient for a winner to be probable.
[/quote]

Huh? That doesn’t even make sense.

T_aquaticus · December 27, 2017, 6:53pm

Hey Mark_Moore, glad to see that you have chosen to continue this topic since it makes for an interesting scientific discussion!!

Mark_Moore:

I would suspect that there are tens of thousands of substitution errors between us and pan fixed within the alleles of existing genes (I estimate 38,400). This is not surprising since there are hundreds to thousands of base pairs in each of the 22,000 genes and each base pair is a candidate for a substitution. In addition, several classes of easy substitutions exist which appear to be completely neutral.
Gene duplication does not have nearly so many chances since there are far fewer genes than base pairs within genes. But even once they occur they are simply copies of existing genes. To count as new genes they must undergo a mutation severe enough to produce something new- and unlike with the base-pairs that mutation is more often than not likely to be harmful. Only the exceptional genes can even enter this “lottery” to become fixed.

Just a few things . . .

If memory serves, the chimp genome paper catalogued 35 million substitution mutations and 5 million indels (which would include gene duplications) separating chimps and humans. Insertions and deletions of DNA are rarer, but not improbable.
It would be helpful to understand the terms orthologous, paralogous, and homologous. They all mean different things, and they also form an interesting Venn diagram. Long story short, orthologous genes are found at the same position in two genomes. Homologous sequence is sequence that is similar no matter where it is found in the gneome. Paralogs are homologous sequences that are found at different places in the genome. When a gene is duplicated, the duplicated copy is a paralog. You will not find the duplicated gene at the orthologous position in the other species’ genome. This is how they detect gene duplications, by comparing the DNA that flanks a gene with the same DNA in the other species. For example:

Human: ATCGATCT--Gene 1--TTCGATT Chimp: ATCGATCTTTCGATT

We can find the same DNA in the chimp and human genomes, but we notice that the chimp genome does not have Gene 1 inserted at that position. Conversely, this could also be a case of Gene 1 being lost from the chimp genome. Only by comparing DNA across a phylogeny can you distinguish between gene loss and gene gain.

As far as number of new genes, we are talking about less than 2,000 mutations compared the tens of million total mutations that separate chimps and humans.[quote=“Mark_Moore, post:7, topic:37511”]
Thus the 4N * generation time you gave as a formula for average fixation time for the winners of the lottery has a severe hurdle to meet before it even becomes relevant to our calculations -it does not consider the problem of how long it takes to accumulate lottery “players” in numbers sufficient for a winner to be probable. The average “lottery winner” new gene may indeed be able to fix in only 60,000 generations, but how many generations pass before there are enough candidates accumulated so that equations about lottery winners are even applicable?
[/quote]

The deal with winners in the neutral drift lottery is that the odds of winning are proportional to the number of players. The chances of a neutral mutation reaching fixation is 1/2N where N is the effective population. If there are only 10 in the population then a neutral mutation will have a 1 in 20 chance of reaching fixation. As you increase the population size the chances of a specific neutral mutation reaching fixation increases, but so too does the number of mutations in the population. Therefore, the probabilities all balance out to the mutation rate. The fixation of a neutral mutation is proportional to the rate at which they appear regardless of population size. Of course, this is an idealized model so things get a bit messier with real populations.

I think it is also worth pointing out that the 700 number that is being tossed around could include genes that have not reached fixation. I have yet to read anything in any paper stating that all 700 are fixed (>99%) in the modern human population.[quote=“Mark_Moore, post:7, topic:37511”]
So separate from the previous question, I ask you can we observe comparable “big bursts of activity” where genomes are “suddenly rearranged and changed” in the field today? If we cannot observe changes of similar magnitude anywhere today, how can we ascribe these changes to known genetic processes?
[/quote]

We can observe big differences in genetic recombination rates within the same genome. Recombination rates can be highly influenced by the DNA sequence in that region which means that rare events can produce regions of the genome that are much more susceptible to recombination. Natural processes are more than adequate when it comes to bursts of recombination within a genomic region.

gbrooks9 · December 27, 2017, 6:54pm

@sfmatheson

Okay, I spoke too optimistically. It is still a rare process. I can live with that.

But what biologists complain about (in terms of a “true” Ring Species vs. a flawed Ring Species) is not relevant to what we here find important about Ring Species.

sfmatheson · December 27, 2017, 7:03pm

“Flawed ring species”??? No such thing, not even when you capitalize it.

The ring species phenomenon is a nice illustration of various components of evolution, namely the interactions between reproductive isolation, geographical isolation, gene flow, and ecological factors. There is no particular “process” of “ring speciation,” and there is nothing specifically interesting about the phenomenon.

Ring species are rare but useful pictures of evolutionary processes. And that’s all they are.

gbrooks9 · December 27, 2017, 7:43pm

@sfmatheson,

So you never saw the posting by a YEC, quoting from a science article about an example of a Ring Species not being a True Ring Species?

I saw the post. I saw the article. And I explained to the YEC that what the complaint was about was not the issue of Speciation.

Would you like me to try to find that article for you? The last time you jumped on my case about an issue related to Ring Species, it didn’t really turn out the way you thought it would.

Do you believe that I encountered such an article? Or do you want me to find the evidence? Or are we good on the topic?

sfmatheson · December 27, 2017, 7:55pm

Oh look, it’s a mind reader. Just another thing evolution can’t explain.

I was responding to what you wrote. I don’t care at all about some YEC somewhere else.

gbrooks9 · December 27, 2017, 8:10pm

@sfmatheson,

But I was quoting the academic as well as the YEC. How can you be jumping my case about what another academic wrote? I disagreed with the journal’s author, or at least with how the YEC was trying to abuse what the author’s intended point was. I made the specific effort to make that point.

Somehow this has aggravated you to a considerable degree.

Bill_II · December 27, 2017, 8:28pm

Mark I did read your post and to be blunt I couldn’t make much sense out of it.

When you say “likely” that indicates you are going to try to calculate a probability.

Genes which do not improve adaptation or even reduce adaptation can be passed on. There is no concept of a gene “earning” a place on our genome.

That is a very narrow definition which is not held by many of the people here.

You can suspect all you wish, but without knowing this field well any estimate you can come up with will be wrong.

And this is an example of what makes no sense to me.

Simple answer is the “big bursts of activity” took place over vast periods of time. Probably much longer than the length of time we have been looking. You forget humans have been keeping track of animals for only 400 years or so and the article is talking about millions of years. It’s not a drop in the bucket it is a drop in the ocean. Genetic processes are not only characterized by what we have seen in the last 50 years but by what is seen in the genetic record recorded in our DNA and the DNA of other species.

T_aquaticus · December 27, 2017, 8:50pm

It is also important to remember that the idealized picture of a ring species can differ from the real ring species. There may be crossbreeding in the real world where it is not pictured in the model. But like you say, the concept is ultimately what is important.

Christy · December 27, 2017, 9:07pm

True. It illustrates the problem with seeing divine action in terms of “interventions.”

Mark_Moore · December 27, 2017, 11:02pm

Now you’re really talking to me. That’s good.

OK fine. I concede you would win some sort of point for debate, but the substance remains. Dr. Eichler said what he said about his findings.

Well, T-water gave a figure of 35 million substitution changes between us and pan, and five million insertions, and I have seen that figure elsewhere too, so I will stick with my statement that they are much less common (7-1) though not rare.

These figures obviously included our non-coding DNA, and as you point out a large number that are in some humans and not others (have not reached fixation) and would thus be weeded out of our test further along…

.[quote=“sfmatheson, post:29, topic:37511”]
Do you have a reference for these claims?
[/quote]

The Evolution of Mammalian Gene Families Note figure one. They say " including changes likely driven by adaptive natural selection" and further down give a number of specific examples.[quote=“sfmatheson, post:29, topic:37511”]
Since you were talking about duplications, isn’t it true that duplications are in “functional locations” almost by definition?
[/quote]

You might think so, and I am curious about that but in this case I am talking about 700 genes that every human has and no chimp has. This implies to me that this is not merely a dupe of an existing gene found in both species but rather a new gene. Maybe some of them are dupes, but not a dupe of anything before the split.

PS- this is just one aspect of the differences between us and pan. The non-coding changes are even bigger. 33% of large substitutions in humans are not found in chimps?

The formula “4N * generation time” is a calculation of how long it takes on average for the “winners” to fix in a population. Most new mutations will not fix. Some may not last until the next generation. It assumes a certain number of genetic “players.” So if there was only ONE genetic mutation, the average time for it to fix if it fixes may be 4N * generation time BUT the odds of it fixing are still very low.

Say there were 5 million new genes but only one in a million reaches fixation. Those genes may fix in an average time of 4N * generation but out of a pool of five million genes only five make it. They make it relatively quickly, but there are still not enough of them to explain the 689 new genes on the human genome. You need the time to generate ANOTHER five million genes (many times over) and let them fix too.

Mark_Moore · December 27, 2017, 11:17pm

First part true. I was trying to calculate something very specific. But if there is no concept there should be, all I was trying to say is if I gene improved fitness it would be more likely to reach fixation than if it did not. Glipsnort showed me that chance overwhelms fitness, but that does not mean that it has no effect.

I realize that now. What is your definition? Is there a “majority view” of the definition on this board? It seems to me that it would be pretty important to have one to keep people from talking past one another.

Well, I took chimp-human difference of 1.2% x 150 base pairs x 22,000 genes. But T water says the answer is in the millions, though we may be talking about two different things. He is counting them all, even in non-coding genes where lots of changes can build up without harm. I was just talking about on the coding genes.

The formula 4N * generation time describes how long it takes the average gene that fixes to reach fixation. 1.2 million years in our case. But most new genes don’t fix. Only a tiny fraction do. Even if our genomes produced 1 million new genes over the last five million years, if only one out of 50,000 fix then we only have 20 new genes (when what we see is 700). It doesn’t matter that the ones which got fixed did so in 1.2 million years on average. That may be how long it takes the average “lottery winner” gene to get fixed, but it says nothing about how long it took to generate enough new genes to have 700 lottery winners.

sfmatheson · December 27, 2017, 11:21pm

You weren’t writing about insertions. You were writing about duplications. Those are different. Your figures are wrong, and a quick glance at the literature would reveal this.[quote=“Mark_Moore, post:39, topic:37511”]
The Evolution of Mammalian Gene Families Note figure one. They say " including changes likely driven by adaptive natural selection" and further down give a number of specific examples.
[/quote]

That paper does not back the claims you made. Here are those claims, from your post:

Those claims are unsupported by that paper, and by any paper. Almost none of the “at least 700 genes” have been shown to “do things that no previously existing genes did.”

[quote=“Mark_Moore, post:39, topic:37511”]
@sfmatheson Since you were talking about duplications, isn’t it true that duplications are in “functional locations” almost by definition?

You might think so, and I am curious about that but in this case I am talking about 700 genes that every human has and no chimp has. This implies to me that this is not merely a dupe of an existing gene found in both species but rather a new gene. Maybe some of them are dupes, but not a dupe of anything before the split.[/quote]
On what basis do you make these claims? Have you read any papers that even suggest this?

Your numbers are made up. Your analysis is simplistic, indeed laughably so. If you disagree, find a collaborator and submit to a journal. Good luck.