Since person of Francis Collins connects both BioLogos and Human Genome Project and because genetics recently show ups around me quite often, my thoughts were recently orbiting around it. I’m not biologist, mathematical physics is the only science that I know a little bit, so it is impossible to me to assess what was the legacy of it since the moment when it was declared as completed in 2003.
More round date, like 2023 would probably bring much more attention to it, at the same time I found information, that only in May of this year (2021) “complete genome” phase ended. Does anyone here know what this means for biology? And what we learn from it since 2003? I think it is good place to talk about it a little.
Great topic. The advances in genetics have been profound in medicine due to knowledge of the genome. Counseling can be done with parents and future parents, diagnosis of genetic conditions, treatment can be customized, risk of future disease can be accessed and intervention done. While many specific examples can be made, I’ll give one example that directly affected a friend, and affects many. My friend had several close relatives dxed with breast cancer, and genetic tested show her positive for the BRCA gene mutation, placing her at very high risk for breast and ovarian cancer. She elected for removal of ovaries and prophylactic mastectomy (which is not for everyone but was her preference) and now has a much better chance of living a long and healthy life.
If the HGP did say that they had a complete sequence of the human genome in 2003 then they would have been wrong. Francis Collins described it as the “first draft of the human genome” which is probably the best description. There wasn’t a gap-free sequence of the reference genome until very recently (May 2021). I wouldn’t be surprised if there was still some QC to be done on the May 2021 sequence, but I could be wrong.
What it means for science is that people can look for important function in the last few percent of DNA that wasn’t a part of the reference sequence. The last chunks of the genome to be sequenced were highly repetitive so there wasn’t an expectation of find a lot of new genes, but it looks like they were still able to find a few coding genes.
I haven’t read the entire paper, but I would guess that the 2,226 paralogous gene copies are mostly pseudogenes of gene duplicates.
Will any of these newly discovered genes and sequence be important in medicine? Who knows. They could be. I’m sure there are many scientists crunching that data right now. One of the first things I would do is look at old RNA-seq data to see if there are any differentially expressed RNA’s that match up to these newly discovered genes. In other words, are any of these genes expressed differently in cancers, heritable diseases, infections, or other pathologies? The data is probably sitting out there right now ready for someone to run the analysis.
Whoever declared it finished is a fool as a human is not a single cell organism. We should understand by now that what makes us human is far more than the genome that people have looked at. Genetically speaking the genome present in a single stem cell is a genetic minority in a human
There is at least few public talks of Francis Collins when he states that HGM was ended “before schedule and under budget” (he used similar words). So, I was quite surprised when I found information about full genome delivered few months ago.
Of course public talk doesn’t need to be as precise and rigorious as scientific article, so prof. Collins or anybody else is allowed to make such simplifications. I just was imagined that he means something else by “HGM was ended”.
They never had the goal of getting a complete sequence, so that makes sense. The very rapid development of sequencing techniques in the late 1990’s is what made it possible to get the first draft sequences done under budget and ahead of schedule. They were also in a race with Celera which spiced things up.
Scientific jargon and hedging doesn’t always translate well into common language. For example, “the initial sequence is complete” is translated to “the sequencing is complete”.
There are efforts to sequence the human microbiome, though it’s not easy to be confident about what might just be passing through and what is actually resident in some fashion. (Note that the popular claim that far more cells in you are bacterial than human is wrong; it was based on one rough estimate that got popularized, and a more systematic calculation suggests that the numbers are similar to each other in order of magnitude.) But because there are many different microbes, not evenly distributed, thorough sampling is much harder than targeting the genome, which can be documented from any one cell. (Yes, there are going to be a number of individual mutations, but the level of variation is generally small unless something’s wrong, e.g. cancer).
No one is saying anything different. No one is claiming that the DNA sequence of the human genome holds every answer to every question. To use an analogy, we can determine how many pages are missing in a dictionary without also believing that a dictionary holds the meaning of life.
Actually, all live human cells have genomes - the set of all genes in them. Mature mammal red blood cells lose the nucleus, but still have mitochondrial genomes. Depending on your definition, platelets might also be thought of as cells that lose the nucleus, or just as pieces of cells.
As another complication, “genomic sequencing” or “genomic analysis” often refers to any technique that gets a lot more sequence at a time than the old one gene at a time approach. This often is only a subset, and sometimes a small subset, of the total genome, but still is a lot of data. E.g., the name “genomic skimming” conveys the idea that this is a relatively small and select bit of the total genome.
Is “genome” a synonym for “complete set of DNA”. Because I thought every cell carries a complete set of our DNA. Not that I teach any life science courses, but still - I should stop repeating that outsider’s “folk wisdom” if it isn’t true.