Why I remain a Darwin Skeptic

And if anyone wants to check my work or just play along at home, what I did was install the Levenshtein python package (‘pip3 install python-Levenshtein’) and run the following code:

#!/usr/local/bin/python3 
import Levenshtein as lev
import matplotlib.pyplot as plt
import random

def main() : 
  gc_content = .41                                                                                                                       
  niter = 1000    
  k = 100                                                                                                                                 
  ngc = nat = nmixed = ntransition = 0                                                                                                   
  for it in range(niter) :                                                                                                               
    # pick two strings, each k long, of a, c, g, t, with frequencies determined by gc_content                                            
    al = random.choices(['a', 'c', 'g', 't'], weights=[0.5*(1-gc_content), 0.5*gc_content, 
          0.5*gc_content, 0.5*(1-gc_content)], k=k)     
    bl = random.choices(['a', 'c', 'g', 't'], weights=[0.5*(1-gc_content), 0.5*gc_content, 
          0.5*gc_content, 0.5*(1-gc_content)], k=k)     
    a = ''.join(al)                                                                                                                      
    b = ''.join(bl)                                                                                                                      
    # lev.opcodes does all the work -- it provides a list of edits to turn a into b                                                      
    for tag, i1, i2, j1, j2 in lev.opcodes(a, b) :                                                                                       
      if tag == 'replace' :           # substitutions only                                                                               
        if i2 - i1 > 1 or j2 - j1 > 1 : continue             # single character only                                                     
        if a[i1] == 'c' and b[j1] == 'g' : ngc += 1                                                                                      
        elif a[i1] == 'g' and b[j1] == 'c' : ngc += 1                                                                                    
                                                                                                                                     
        elif a[i1] == 'a' and b[j1] == 't' : nat += 1                                                                                    
        elif a[i1] == 't' and b[j1] == 'a' : nat += 1                                                                                    
                                                                                                                                     
        elif a[i1] == 'a' and b[j1] == 'c' : nmixed += 1                                                                                 
        elif a[i1] == 'c' and b[j1] == 'a' : nmixed += 1                                                                                 
        elif a[i1] == 'g' and b[j1] == 't' : nmixed += 1                                                                                 
        elif a[i1] == 't' and b[j1] == 'g' : nmixed += 1                                                                                 
                                                                                                                                     
        elif a[i1] == 'c' and b[j1] == 't' : ntransition += 1                                                                            
        elif a[i1] == 't' and b[j1] == 'c' : ntransition += 1                                                                            
        elif a[i1] == 'a' and b[j1] == 'g' : ntransition += 1                                                                            
        elif a[i1] == 'g' and b[j1] == 'a' : ntransition += 1                                                                            
                                                                                                                                     
        else : print('huh?', a[i1], b[j1])                                                                                               
                                                                                                                                     
  print(ngc, nat, nmixed, ntransition)                                                                                                   
  fig, ax = plt.subplots()                                                                                                               
  types = ['Transition', 'G<->C', 'A<->T', 'A<->C/G<->T']                                                                                
  ntot = niter * k                                                                                                                       
  counts = [ntransition, ngc, nat, nmixed]                                                                                               
  rates = [ntransition/ntot, ngc/ntot/gc_content, nat/ntot/(1-gc_content), nmixed/ntot]                                                  
  ax.bar(types, rates)                                                                                                                   
  ax.set_title('Levenshtein replacements, random sequence (per available base)')                                                         
 #  These give the uncorrected values:                                                                                                    
 #  ax.bar(types, counts)                                                                                                                 
 #  ax.set_title('Levenshtein replacements, random sequence (uncorrected)')                                                               
                                                                                                                                     
  fig.savefig('lev.pdf', bbox_inches='tight')                                                                                            
                                                                                                                                     
main()
2 Likes

Steve, I’m very grateful for you posting this information. I’ve read it through a few times but I admit it is a little over my head. I was wondering if you would be kind enough to explain your two charts and their significance in an ELI5 way? (Explain like I’m 5)

Sorry, and thanks in advance. :see_no_evil:

So, happy to show and tell, B.Sc.(Hons.) in Biological Sciences, Lancaster (the original Roman one), '75

London Hospital Medical Centre, Whitechapel, Oral Microbiology Unit, laboratory technician, '80.

They have an outstanding pathology collection. Including the Elephant Man who lived there, his heart breaking, ghastly, sci-fi horror skeleton and that of a syphilitic. Green. Every human body part with every kind of morbidity. Only one put me off my lunch. Not the worm bored Swiss cheese brain. A tumourous foot.

I can tell the life science post grads here. And non.

The oxygen level was 63% of current. Not zero extensively. And where do you get sulphurous from?

How should we observe evolution at the genetic level that we don’t? How don’t we?

Sure terrestrial conditions would have had more oxygen, but marine conditions were largely anoxic in the Ediacaran.

That link explains the widespread anoxic and then sulfuric conditions in the late Ediacaran and early Cambrian. Then later the oceans became less sulfuric.

The theory of evolution lacks transitionary fossils, the theory of creationism lacks the discovery of an oasis of life forms suited to a non anoxic, and non sulfuric ocean during that transition. Both theories are lacking fossils. We need to look for rare marine environments that were non anoxic and non sulfuric before we can conclude that a large range of extant species did not exist then.

In terrestrial environments, as you say oxygen levels were very high. They would be toxic actually to most extant species. If one is to assume that extant terrestrial species didn’t exist back then, one first has to extensively research terrestrial areas of non toxic oxygen levels, suitable to today’s fauna/flora. ie high elevations where oxygen levels were lower.

A further influencing factor was co2. Some plants prefer higher co2, some don’t. Does this mean that those that don’t, did not exist back then? Or possibly they were in niche locations, and expanded out when conditions changed. This is what we observe today, niche organisms can dominate an eco-system when the environment changes.

Evolution does exist on a genetic level, but it’s mainly due to changes to allele frequencies with the occasional mutation. These changes can cause dramatic changes to an organism, but not via additional unique genes added to the genome, as often surmised by evolution. This evolutionary process of additional unique coding genes which improve fitness, is rarely observed, if ever.

I feel like we’ve been before @mindspawn

Just checking A. We’re going to hear some new ideas and not going to rake over old ground, B. We’re going to keep the conversation around ID - Which is what this thread is about :slightly_smiling_face:.

1 Like

A) evolution isn’t a new idea, neither is creationism. Sure some old ideas will come up here and there, it’s unavoidable.

B) ID is core to my position. The interpretation of the fossil record and comparisons between the 2 theories are relevant when discussing ID. Does the TOE contain any advantage over ID when looking at the fossil record? That is my core focus. If most species appeared suddenly without precursor, this would point to ID.

C) it is a rule of this site that a thread can close if inactive for a week. I’m often inactive for a week. I see no reason to prohibit revisiting previous topics not fully discussed.

I simply wanted to check we weren’t going to rehash the thread I linked to.

Do you have any new evidence to support your hypothesis since the last time you raised it (in the linked thread)? Genuine question.

I’m sure some of my views expressed before will come up here, but the emphasis is different. Previous focus was specifically the Cambrian Explosion, my focus this time is on general environmental changes, and the search for niche environments from which radiations of rare species would dominate earth in waves. One would expect this, through observation of changed eco systems today.

We will happily open a closed thread for you if you would like to return to it. Just message the moderators.

1 Like

Thanks, I never knew that. I’m happy to continue old discussions here, but if the moderators prefer, we could open an old thread. Liam doesn’t seem to like the idea of continuing old discussions here, I don’t see the problem with it, as long as it retains some relevance to the opening post.

It would be better to start a new thread. We are already on post 577. If you would like it linked to this one, click on the time stamp in the upper right of a post in this thread and click on the + new topic to start your new thread.

Noted thanks. I will possibly do so in the next few days.

I’ll reply on the Zombie thread on claims of divine intervention (but not ID?) in the Cambrian.

I can try. If you’ve read my original blog post, you know its point: the pattern of genetic differences between humans and chimps is the same pattern you see between individual humans, and that both patterns represent accumulated mutations. So if you have a T at a certain spot in your genome and my base is different, mine is more likely to be a C than an A because C and T mutate into one another more readily than A and T.

‘Not so fast’, says @EricMH: that pattern doesn’t have to come from mutations. It could just reflect random differences between the genomes, with the pattern governed by the composition of the genomes. (Note: both the human and the chimpanzee genomes consist of 41% Gs and Cs and 59% As and Ts.) To test this idea, he generated random fake genomes with the appropriate composition and compared them. To make the comparison he used a calculation of the Levenshtein editing distance, which determines the minimum number of edits (insertions, deletions, substitutions) to turn one string (e.g. the letters of a genome) into another string; from that calculation, he pulls out the substitutions (also called replacements). You can take this to be kinda, sorta like the process of aligning human and chimp DNA and identifying single-base differences between them; those differences were what I used to determine the patterns in my blog post. He said that the resulting pattern of substitutions looked a lot like my pattern, showing that you don’t have to invoke mutations to explain the pattern.

The problem is that, when I tried to do exactly what he described, I ended up with the two plots you’re asking about. The first one represents what it sounds like Eric did, which is to tabulate the number of differences of each kind. That’s not going to be very interesting, in fact, since different categories in the plot have different opportunities to occur. For example, to get into the A<->T column, genome 1 can have an A ant genome 2 a T or vice versa, so only As and Ts can contribute to that category. To get into A<->C/G<->T, on the other hand, you can start with any of the four bases: the two genomes could have A:C, C:A, G:T, or T:G. So there are roughly twice as many opportunities for these differences, and that column is going to end up about twice as high A<->T. When you correct for that effect by dividing the A<->T by the number of As and Ts (and do the same for C<->G), you end up with the second plot, which looks quite uninteresting and nothing like the real distribution seen in data.

3 Likes

Thanks so much for your patient reply, Steve. I think it is starting to click now. I’ll have another read of your blog post and the related replies here and get back to you if I have any further questions (maybe by PM so as to keep this thread on track). Thanks again.

It is not just the GC content, but the frequences of each letter. For instance, the following string:

aaaatttggc

has frequencies:
a: 4/10
t: 3/10
g: 2/10
c:1/10

I compared a real human mtDNA to a real chimp mtDNA using the Levenshtein distance and got something that looked like your distribution, looking just at the replacement edits.

I then tabulated the atgc frequencies for the human mtDNA and used those frequencies to randomly generate another mtDNA of the same length. Same for chimp mtDNA. I then ran the same Levenshtein analysis and looked like I got a very similar distribution, except for the A<->C/G<->T bar.

I will be rerunning the analysis to make sure I didn’t make a mistake, and will post the code and graphs.

Side note, reason I went with Levenshtein first is because it was easiest way for me to reproduce what you did, but also the alignment algorithms have a substitution matrix which could bias the alignment to produce your distribution. I noticed in the hg19.panTro5.all.chain file that you linked had a substitution matrix that would favor alignments that would boost your transition bar:

    A    C    G    T
A   90 -330 -236 -356
C -330  100 -318 -236
G -236 -318  100 -330
T -356 -236 -330   90

Note the penalty is smaller for A-G and C-T alignments, which are the alignments that make up your transition bar. Since optimal alignment is computationally intractable for long DNA sequences, the algorithms have to use heuristics, and one heuristic is to assign penalties for different letter alignments. The above heuristic will assign smaller penalties for the alignments that make up your transition bar, so will possibly bias the inferred mutations towards your transition bar.

The alignment algorithms are similar to Levenshtein, but Levenshtein doesn’t make use of this heuristic matrix, so going with Levenshtein avoids this possible source of bias. I’m happy to see I still reproduced a distribution that looked similar to what you have, so the alignment algorithm didn’t introduce too much bias, if any.

Another side note, this use of heuristics in alignments, and then inferrence of evolutionary tree from said alignments, is what makes me somewhat skeptical about claims like ‘perfectly nested clades’. My understanding is these substitution matrices are essentially derived from existing human curated phylogenetic trees, and thus we have the possibility of data leakage into algorithmically generated trees from human constructed trees. Thus, the much acclaimed match between trees generated from genetic data and morphology could potentially be a case of what we call ‘data snooping’ in machine learning. In other words, the trees match because we’ve made them match. That’s why I’m interested in using metrics like plain Levenshtein distance where we don’t have the possibility of bias creeping into our results.

1 Like

Bur for any decent-sized chunk of the genome (i.e. not the mtDNA), AT and GC are each split just about evenly between their constituent letters. The counts for chromosome 1 are:
T 67244164
A 67070277
C 48055043
G 48111528
So treating A=T and G=C is an excellent approximation.

1 Like

Well, that’s not what I see in my mtDNA:

Counter(human)
Counter({‘c’: 5176, ‘a’: 5123, ‘t’: 4094, ‘g’: 2176})

Yes, as I said, mtDNA behaves quite differently. It has a very different replication system than the nuclear genome, with a meaningful difference between how the two strands are replicated and with substantial difference in base composition between them. In the nuclear genome, replication and transcription begin on both strands and there is no meaningful difference between the two ends of the chromosome or between the two strands – the content is different if you read in the opposite direction, of course, but the same machinery operates in both directions. As a result, any process that biases the base composition of one strand also operates on the other, and complementary bases occur equally frequently.

4 Likes