How to account for the difference in gene numbers between chimps and humans?

Perhaps but Dr. Eichler emphasized how radical the changes were in a relatively brief amount of time.

And the very question at issue is whether these “big bursts” in the genetic record are wholly explainable by known natural processes or if we should be looking for something else. So looking at the “genetic record recorded in our DNA” is not going to give us the answer to that question.

I guess it depends on how you view the numbers. There are 35 million single base substitutions separating Homo and Pan. There are 5 million indels covering 67 million bases. So indels cover 67 million bases of differences while substitutions cover 35 million bases. However, 1 indel is considered to be 1 mutation, and a gene duplication would count as a single mutations just like a substitution mutation. Like I said, it all comes down to what the numbers mean in a given context.

If we are talking about 700 new genes, that would be 700 out of a total of 5 million indels. By my math, that is about a 1:7000 ratio.

Copy number variations among humans is a related topic since they are indels. I don’t know the precise numbers, but it wouldn’t surprise me if the largest source of variation between human genomes, as measured by number of bases, was indels (with copy number variants being a subcategory of indels).

That is not what they said. They said that these were genes without an ortholog in the chimp genome. Orthologous is not the same as homologous. If there is a gene duplication in the human lineage then that duplicate will be at a new position in the human genome. The same spot in the chimp genome will not have that gene. This would make the duplicate a paralog that is homologous to the chimp gene but not orthologous to the chimp gene. Confusing, I know, but these terms are important to understand. You can read more here:

http://homepage.usask.ca/~ctl271/857/def_homolog.shtml

Orthologs are found at the same position in each genome. Paralogs are duplicates. Homologs share sequence through common descent and can be orthologs or paralogs.

The number of genetic players isn’t assumed. It is right there in the formula: N.[quote=“Mark_Moore, post:39, topic:37511”]
Say there were 5 million new genes but only one in a million reaches fixation. Those genes may fix in an average time of 4N * generation but out of a pool of five million genes only five make it. They make it relatively quickly, but there are still not enough of them to explain the 689 new genes on the human genome. You need the time to generate ANOTHER five million genes (many times over) and let them fix too.
[/quote]

The effective population size size is usually between 10,000 and 100,000 for humans, so the probability of a new gene reaching fixation would be 1/2N, or 1 in 20,000 to 200,000. For 700 genes you would need 140 million gene duplications over 5 million years. In 5 million years with a 25 year generation time and 100,000 population, that is 20 billion individuals. 140 million gene duplications over 20 billion individuals would be a gene duplication in every 150 births, or there about. These are just back of the envelope calculations and I might have gotten some wrong.

2 Likes

I think it is now standard to refer to indels as insertions and deletions of less than 1 kb, with CNVs defined as structural changes involving more than 1 kb. See here for example. And yes, a majority of human genetic diversity is accounted for by CNVs. Link from above: Global diversity, population stratification, and selection of human copy number variation - PMC

2 Likes

Of all the folks on this board you seem to have the best grasp of where I am going with this stuff, even if you don’t agree with it. Your thought processes are moving my thinking along, though it may or may not lead to the conclusion I originally thought.

It is confusing. That link you gave me said “Orthologs are genes in different species that evolved from a common ancestral gene by speciation.” So the claim for these 700 genes was that there was NO ortholog. So these 700 human genes did NOT evolve from a common ancestor in the chimp genome. If they came from duplication then to me that implies they occurred AFTER the purported chimp-human split. Those other distinctions are not as critical as this fact because it tells us when to “start the clock”.

See you are giving me relevant numbers to work with. Thank you! And I notice that your high-end effective population size is more reasonable than the 15,000 that I keep getting elsewhere. The idea that our linage spent six million years on the edge of extinction without going extinct seems very unlikely, but I suppose that is where some see God’s hand at work.

I want to point out though, that your reasonable effective population numbers are problematic for the idea that typical known processes can explain the change without a rate problem. 4N * generation time in this case is 4 * 100,000 *25 = 10,000,000. There is not enough time for those genes to fix unless they spring forth right at the split, which smacks of an unseen Hand. Actually there is not enough time even then. We have six million years, or really less.

Regarding gene duplication rates, I know this source gives more up to date information than the .009/gene/million years rate that some have been using…

That source says the figure, which unlike earlier figures takes into account that you must subtract out deletions, is more like .00123/genes/myears.

So .00123 * 22000 * 6 million years = 162. That is the calculated value vs. the observed value of almost 700.

You may argue that the 22,000 figure does not take into account changes in non-coding or formerly “junk” DNA, but I would say if you consider the question of the amount of change in such DNA from the chimp that you would actually have a worse rate problem. We vary from them MORE in the non-coding regions than we do in the coding genes.

That source appears to be at least 10 years old. Here is a much more recent review article that describes how duplication rates have been estimated and why they vary:

4 Likes

With the holidays, I haven’t been keeping up with these threads, but I wanted to note – this is a useful reference I hadn’t seen before.

3 Likes

In previous posts you seemed to be arguing that these 700 genes could not be duplicates, so I was trying to show that they very well could be. As you state, a duplicated gene is not considered to be a gene that came through common descent. After a duplication event the duplicates (i.e. paralogs) are considered to be on their own evolutionary trajectory and are treated differently than the ortholog that they are duplicated from.

So just to make this clear, even though the duplicates are not strictly considered to be a product of common descent they are still homologous to a chimp gene.[quote=“Mark_Moore, post:45, topic:37511”]
I want to point out though, that your reasonable effective population numbers are problematic for the idea that typical known processes can explain the change without a rate problem. 4N * generation time in this case is 4 * 100,000 *25 = 10,000,000. There is not enough time for those genes to fix unless they spring forth right at the split, which smacks of an unseen Hand. Actually there is not enough time even then. We have six million years, or really less.
[/quote]

First, the 4N number is an average which means some will fix quicker and some will fix slower. On top of that, these gene duplications or new genes could have preceded the split. After the split they could have been lost in one lineage and fixed in the other lineage. This is called incomplete lineage sorting and it is a known biological process.[quote=“Mark_Moore, post:45, topic:37511”]
That source says the figure, which unlike earlier figures takes into account that you must subtract out deletions, is more like .00123/genes/myears.
[/quote]

I’m not able to access the paper at this time, so I will take a look at it when I can.[quote=“Mark_Moore, post:45, topic:37511”]
You may argue that the 22,000 figure does not take into account changes in non-coding or formerly “junk” DNA, but I would say if you consider the question of the amount of change in such DNA from the chimp that you would actually have a worse rate problem. We vary from them MORE in the non-coding regions than we do in the coding genes.
[/quote]

In coding regions you will have negative selection against deleterious duplications which you need to take into account.

Interesting reference indeed. It may be a stretch to base duplication rates on a handful of genes, but there are still some interesting numbers to look over:

“More recently, inverse PCR-based methods were used to measure the rates of duplication and deletion of human α-globin genes (Lam and Jeffreys, 2006, 2007). The frequencies of spontaneous α-globin duplication in sperm were 2.6 × 10-5 and 6.2 × 10-5 in two human males. However, it is possible that the actual duplication rate of α-globin genes is in fact higher than reported because the PCR primers used to detect the duplications were designed to detect specific kinds of duplications, and translocated and inverted duplications would not have been detected. Similar methods were used to determine the duplication and deletion rates at four loci in humans and the duplication rate estimates ranged from 1.7 × 10-5 to 8.7 × 10-7 (Turner et al., 2008).”

Those rates are duplications/locus/generation, so it is the chances per individual of a specific gene being duplicated. If we take a rough median of 5x10^-5 per gene per generation and 22,000 genes that would be 1.1 duplications per individual. This also assumes that all of the duplications come from the paternal side, but that isn’t too much of a stretch since most mutations come from the paternal side due to the number of cell divisions it takes to produce sperm.

Saying that some of the fixes might have occurred quicker than average does nothing to undermine my case, because for that to happen “quicker than average” over 500 times out of 689 is itself an indication that something beyond the law of averages is at work- just as the difference between expected number and observed number would indicate.

So far there were 162 differences expected vs. 689 found- that is where the calculation sits at present. Now I would feel better if the numbers showed 1,620 differences. That is, ten times the amount of change than the unaided norm of nature. But this is a data point that is at least on the side of the line that it would be on if my “Fingerprint of God” hypothesis were correct.

Also, based on the comments I shared from Dr. Evan Eichler, it seems that the human-chimp common ancestor was the one which saw the most genomic re-arrangements. So that if I had the power to do statistical analysis on that- and I don’t - I suspect that the rate problem which shows up weakly here would show up stronger there. That is, there more innovation in a shorter time than we would expect to be the norm in the chimp-human genome, but an even greater amount in the line from apes to the chimp-human “ancestor”. So then there were major changes between ape genome and human genome, but some of these changes were initiated in the lineage which also gave rise to pan, weakening the perceived rate problem.

Further, if this were done at other key points along many genomes we might see the same thing- more changes than we might expect. And while no single instance of them would overwhelm one as miraculously improbable, the cumulative effect of things showing up on the same side of the line would add up to us looking at a real phenomenon. At least that was what I was going for here- it may not be possible for us to calculate because of the next thing you say…[quote=“T_aquaticus, post:48, topic:37511”]
these gene duplications or new genes could have preceded the split. After the split they could have been lost in one lineage and fixed in the other lineage. This is called incomplete lineage sorting
[/quote]

Annnd this may be the end of the road for how far we can take this thought experiment with our limited knowledge base. Now I have a different idea of what “Incomplete Lineage Sorting” is than you do. I figure in most cases it means that drift in a gene makes that gene look more like the gene of an organism less related to us than one more closely related. For example, a gene of ours looks like the corresponding gene on a chimp at the species break, but because of the way it drifts in our pool, the chimp pool, and the gorilla pool, by the time we measure it by chance the gene looks more like the gorilla gene. To the extent it happens like that then it does not affect my calculations.

Now maybe it happens the way you are describing too. I would expect that to be rarer but I don’t have any idea how to calculate the number of genes that this subset of ILS accounts for. That is why we could be at the end of the road here, at least for calculating the magnitude of the change over expected. Whether we are or not I want to thank you for your quality dialogue on this subject!

1 Like

http://genome.cshlp.org/content/20/11/1469.full

From abstract:

At a resolution of ∼30 kb, nine de novo CNVs were observed from 772 transmissions, corresponding to a mutation rate of μ = 1.2 × 10−2 CNVs per genome per transmission (μ = 6.5 × 10−3 for CNVs >500 kb).


Abstract:

Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1–20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.

Direct measurements of gene duplications point to a much higher rate of gene duplication. See my post above.[quote=“Mark_Moore, post:50, topic:37511”]
That is, there more innovation in a shorter time than we would expect to be the norm in the chimp-human genome, but an even greater amount in the line from apes to the chimp-human “ancestor”. So then there were major changes between ape genome and human genome, but some of these changes were initiated in the lineage which also gave rise to pan, weakening the perceived rate problem.
[/quote]

There is no reason why gene duplication rates should stay the same through history.[quote=“Mark_Moore, post:50, topic:37511”]
Annnd this may be the end of the road for how far we can take this thought experiment with our limited knowledge base. Now I have a different idea of what “Incomplete Lineage Sorting” is than you do. I figure in most cases it means that drift in a gene makes that gene look more like the gene of an organism less related to us than one more closely related. For example, a gene of ours looks like the corresponding gene on a chimp at the species break, but because of the way it drifts in our pool, the chimp pool, and the gorilla pool, by the time we measure it by chance the gene looks more like the gorilla gene. To the extent it happens like that then it does not affect my calculations.
[/quote]

Let’s say that a gene duplication happens 8 million years ago and it is found in 0.1% of the common ancestral population for humans and chimps. After the split that gene duplication is lost in the chimp genome. In the human lineage it becomes fixed. Is there anything about this scenario that you find improbable or problematic? I am sure that we can find gene duplications that are only found in 0.1% of the modern human population, so it doesn’t seem that improbable to me. We also find gene variation for a specific gene that has coalescent times that predate the chimp/human split which indicates that we carry alleles that were variable in the common ancestral population.[quote=“Mark_Moore, post:50, topic:37511”]
Now maybe it happens the way you are describing too. I would expect that to be rarer but I don’t have any idea how to calculate the number of genes that this subset of ILS accounts for. That is why we could be at the end of the road here, at least for calculating the magnitude of the change over expected. Whether we are or not I want to thank you for your quality dialogue on this subject!
[/quote]

We just need to realize that when a speciation event happens that each new species will carry genetic variation that was found in the common ancestral population. A speciation event doesn’t cause the two new species to be homozygous for every allele, and the speciation event also does not force each new species to adopt completely different alleles. Variation found in the common ancestor can and does carry over to each new species.

1 Like

I could not make much of that post vis-a-vi “direct measurements”. To me that means you measure me and you measure my grandfather. Comparing me to a chimp, orang, and mouse is not going to be a “direct measurement” if my hypothesis is right because the amount of change measured would include all the Fingerprints of God- giving more change over time than the natural background rate. Thus it would give a number which would mask the magnitude of change between pan and us. I think we need a me-and-grandpa measurement. I used .00123 / gene / milyears and I had a reference for it. What do you think the formula should be?

Not a uniformitarian then? The YEC say that with regard to rates of radioactive decay. If these are natural processes then why should they vary? I mean, maybe there was more radioactivity when the earth was young and so for that reason they slowed over time, but that would be a more or less constant rate of change. What I am suggesting would have a big change in the line leading from apes to humans relative to what we would see as the norm, even while rates are not changing elsewhere.

I think if we throw dates around using molecular clock data, and we all do, then we have some degree of confidence that the rates are pretty stable.[quote=“T_aquaticus, post:52, topic:37511”]
Let’s say that a gene duplication happens 8 million years ago and it is found in 0.1% of the common ancestral population for humans and chimps. After the split that gene duplication is lost in the chimp genome. In the human lineage it becomes fixed. Is there anything about this scenario that you find improbable or problematic?
[/quote]

Well, no more “improbable” than the 1 out of 2N formula we have been using for a gene getting fixed. So if the effective population was 50,000 then there is a one out of 100,000 chance of it happening once. So I suppose if there were 100,000 genes that were at such levels then there is a 50-50 chance that one of the 689 “new” genes could be explained that way. That is a real thing, but not an important thing IMHO, for this problem.

In some of the experiments they are detecting gene duplications in sperm which would be a direct measurement.[quote=“Mark_Moore, post:53, topic:37511”]
I used .00123 / gene / milyears and I had a reference for it.
[/quote]

How was that number determined?[quote=“Mark_Moore, post:53, topic:37511”]
Not a uniformitarian then?
[/quote]

That’s not how uniformitarianism works. Do you think a uniformitarianist requires the temperature to be the same every day? The fact of the matter is that different genes in the same genome will have different duplication rates and different species have different duplication rates. Why would gene duplication be a physical constant like radioactive decay or gravity?[quote=“Mark_Moore, post:53, topic:37511”]
What I am suggesting would have a big change in the line leading from apes to humans relative to what we would see as the norm, even while rates are not changing elsewhere.
[/quote]

Abnormal is not the same as supernatural, nor is abnormal contrary to natural processes. Gene duplication events vary across species and even within the same genome, so why would a variation in gene duplication rates through history be any different?

1 Like

Also potentially flawed by looking across millions of years, but that would mean my case is stronger than that calculation would indicate, not weaker. Again, what number would you use and why?

I am pretty sure that’s not how it works either. I would think it would require the ebb and flow of temperature to respond to known processes and if there was a big change from the pattern then there is a REASON for it and curious people would then investigate to see if they could find the reason.

It sounds like you are expecting to find abnormal if we could look at this closer. I think that is a good bet.

Look, if enough abnormal things happen in the same direction over an extended period of time, there is something missing in one’s conception of “normal.”

So what method did they use in that paper?[quote=“Mark_Moore, post:55, topic:37511”]
I am pretty sure that’s not how it works either. I would think it would require the ebb and flow of temperature to respond to known processes and if there was a big change from the pattern then there is a REASON for it and curious people would then investigate to see if they could find the reason.
[/quote]

So why couldn’t recombination and gene duplication rates ebb and flow due to natural processes just like temperature does?[quote=“Mark_Moore, post:55, topic:37511”]
Look, if enough abnormal things happen in the same direction over an extended period of time, there is something missing in one’s conception of “normal.”
[/quote]

I would agree. One needs to adapt to what reality is really like instead of proclaiming something to be supernatural if it doesn’t meet your expectations of how nature works.

1 Like

Again, the potentially flawed method of comparing differences in various groups of mammals they believe to be along the same lineage for tens of millions of years. And again, if that method is flawed because it assumes that all change is “natural” and “random” processes when in fact nature got some Divine input here and there then my case is stronger than my calculations would indicate, not weaker. And I ask for the third time, what other number would you use there and why?

They could. There could be some natural explanation that we simply have not detected yet- unlike the ebb and flow rate of temperature for which we have long detected the natural processes. We should look for natural explanations for ebbs and flows, and test for them- but right now we have no such explanation for their cause, just evidence that they exist. This ties right into a beef I have about how philosophical naturalism is now impeding science. It may have helped in the past, when the biases were in one direction, but there is such a thing as balance. Now I think there are some questions scientists are hesitant to ask and ideas they are unwilling to test for simply because of the implications undermining philosophical naturalism.

Let’s test for what things should be like when taking into account all known natural processes. And when we find things are different, sure continue to search for undiscovered natural processes, But the more we look without finding one, the more likely it becomes that what we have discovered are the Finger Prints of God.

And one must also be willing to adapt, if truth is the goal, to what reality is really like instead of proclaiming something to be natural even when it doesn’t meet your expectations of how nature works- but to even get to that place one must be willing to recognize when that is.

I don’t know how old you are or how long you have been an atheist. I have been a believer since I was a youth, but my understanding of who God is and how He has operated has changed immensely over the decades. I changed my views in accordance with the evidence.

Or mostly so. With regard to creation and my understanding of early Genesis on my final go-around it was not the evidence which changed, but my understanding of what the scriptures said happened. For 40 years I looked at the same words and struggled with the same apparent contradictions. And then, in my mid-fifties, I opened up the Book on an unexceptional day and I started seeing everything differently. I have no way to “prove” it scientifically, but the change was so abrupt and of such magnitude, and confirmed as true the more I studied it through, that I now believe I got a Divine touch as regards to being able to understand early Genesis. So I suppose I am “biased” as regards to whether the finger of God can cause subtly, perhaps barely detectable, but profound change on Genomes. That same kind of process occurred in me in another area.

At any rate, regardless of what you think of all that, the math is math, the truth is the truth, and we should keep looking, and follow where it leads.

I see you continue to use this figure, taken from a source more than 10 years old. Much more recent data, including measurements take directly from individuals related by descent, have been posted in the forum since then. Those data seem to show that your number is far too low. Have you seen these posts?

1 Like

Then I would assume that this method would produce an estimated gene duplication rate that would be sufficient to produce the differences seen between the species in the study, would it not?[quote=“Mark_Moore, post:57, topic:37511”]
And I ask for the third time, what other number would you use there and why?
[/quote]

I would lean towards observed rates of gene duplication, such as the studies done on sperm. The distribution of rare gene duplications in the modern human population would also be a good estimate to use.[quote=“Mark_Moore, post:57, topic:37511”]
They could. There could be some natural explanation that we simply have not detected yet- unlike the ebb and flow rate of temperature for which we have long detected the natural processes.
[/quote]

There are known natural mechanisms that produce segmental duplications, such as homologous recombination.[quote=“Mark_Moore, post:57, topic:37511”]
I don’t know how old you are or how long you have been an atheist. I have been a believer since I was a youth, but my understanding of who God is and how He has operated has changed immensely over the decades. I changed my views in accordance with the evidence.
[/quote]

I was a Christian for the first half of my life (first 20 years). I would agree that one of the greatest things about life is how you change along the way. [quote=“Mark_Moore, post:57, topic:37511”]
At any rate, regardless of what you think of all that, the math is math, the truth is the truth, and we should keep looking, and follow where it leads.
[/quote]

Agreed. Who knows? Maybe we are both wrong! :wink:

Mark you appear to be arguing that a method is flawed when in fact it produces results that are consistent. I understand your position, “How do we know that God didn’t do it?” The question you need to ask and answer is, How do we know that God DID do it? Trying to do simple math on what is a very complicated problem isn’t going to show that God was involved. Let me give you a simpler example.

We know from Proverbs 16:33 “The lot is cast into the lap, But its every decision is from the LORD.” That is God determines the outcome when He wills it. Casting lots is the same as rolling dice so we can use that to see what is going on here. First rolling dice produces a random outcome. We cannot predict in advance what the exact outcome is going to be. Now after the dice are thrown and the outcome determined we know two things, the outcome and God is responsible for the outcome. Now how do you determine that the outcome came from God? I say you can’t but you can trust the Bible when it says it came from God. Using methodical naturalism to test this wouldn’t detect the actions of God but that doesn’t mean God didn’t act.

You are skating awfully close to a God of the Gaps argument here. What happens when we do find something that accounts for what is different. That is how science functions.

2 Likes

I saw a link with lots of numbers in it, a lot on point mutations which really don’t address the question. I have been asking taquaticus to give me his number and why he prefers it for days now. When people won’t commit to and defend a specific number, I will use my number until they do. Maybe you would care to step in for him on this one?