A new paper in Nature Genetics is worth a look for at least two reasons.
It’s a big-data paper, years in the making, that provides a map of the regions in the human genome that are under some kind of constraint. The authors compared genomes from more than 120,000 humans and found those areas that are evidently untouched by mutation. They infer that these are regions that are likely to be under purifying selection, meaning that mutations there are lethal or debilitating. Their catalog finds some new regions of constraint in addition to “expected” regions–the ones already known to be involved in genetic disease or lethality.
The authors introduce their reasoning using a metaphor that is an encouraging example of excellent scientific writing. (My tweet about the paper is going semi-viral right now as scientists ooh and aah over the authors’ introduction.) The metaphor is one used commonly to introduce the concept of ‘survival bias.’ Here are the first several lines of the introduction to the paper. The link to the paper is at the bottom, and I can point you to the free-text link (from the authors) on request.
During World War II, Abraham Wald and the Statistical Research Group optimized the placement of scarce metal reinforcements on Allied planes based on the patterns of bullet holes observed over many sorties. Wald famously invoked the principles of survival bias to infer that armor should be placed where bullet damage was unobserved, since the observed damage came solely from planes that returned from their missions. Wald reasoned that planes that had been shot down likely took on critical damage in such locations.
Employing similar logic, we sought to identify localized, highly constrained coding regions (CCRs) in the human genome. We were motivated by the idea that the absence of genetic variation in coding regions (for example, one or more exons or portions thereof) ascertained from large human cohorts implies strong purifying selection owing to essential function or disease pathology.