David Reich
π€ SpeakerAppearances Over Time
Podcast Appearances
toward the plateau, there's a three quarters probability that the mutation is real.
If there's a 99% of the way to the plateau, there's a 99% probability that's real.
So that gives us a calibrated estimate of the probability that a particular position is really under natural selection.
A major concern here is that actually what we're seeing is not that these mutations are really under selection, but rather that both association to a disease and our selection signal are due to some third thing that's causing both of them, which is a type of selection which is not what we're after, not selection to adapt to new environments.
what's called background selection, selection against newly arising bad mutations that are removed from the population that tend to be concentrated in genes.
Genes are also the parts of the genome that tend to be associated to traits.
And so this common process is causing both the enrichment for trait signals and is also causing the enrichment for selection signals that we're observing.
That's the concern.
We were super concerned about this.
So what we did is we repeated this enrichment analysis in slices of the DNA that all were affected to the same extent by background selection, by this rain of slightly bad mutations, and we get exactly the same pattern.
We also repeated this experiment in just using mutations of the same frequencies.
because there's different statistical power to detect these signals at different frequencies.
And we see the same pattern where above a value of the selection statistic of around five, we get this plateau.
So there's been a whole series of improvements.
I think that the big ones have been the huge drop in sequencing costs.
which made it possible to generate ancient DNA in the first place.
So the drop in cost has been a million fold since the late 2000s, and another maybe one to two orders of magnitude from 2010 to today.
So that's one big change.
Another change has been in solution enrichment.
So it's been this way of taking a sample that has very small percentages of human DNA, but then suddenly creating a process that will mean that the great majority of the sequences that one's analyzing will be useful for analyses.