Jacob Kimmel
๐ค SpeakerAppearances Over Time
Podcast Appearances
The cost of sequencing has likewise come down.
So even beyond the actual reagents necessary to rip the cell open and turn its mRNAs into DNAs that are ready for the sequencer, now the sequencer is cheaper.
The other piece is actually getting these genes in and then figuring out which ones are there started out pretty bad.
So when we started with this technology, it was a beautiful proof of concept, but I don't think anyone would tell you it was 100% ready for prime time.
When you sequenced a cell,
Only about 50% of the time could you even tell which perturbation you put in.
Sometimes you just like wouldn't detect the barcode and you'd have to throw the cell away or you detect the wrong barcode and now you've like mislabeled your data point.
So this might sound like a trivial sort of technical piece, but imagine you're running this experiment the old fashioned way where you test different groups of genes in different test tubes on a bench.
Now imagine you hired someone who every other tube labels it wrong.
So when you then collect data from your experiment, you basically have no idea what happened because you've just like randomized all your data labels.
You wouldn't do much science and you wouldn't get very far that way.
So a lot of those technologies have improved to the point where you had a number of processes which are pretty inefficient and you multiplied a lot of these things together and ended up with like a very small outcome of successful cells you could actually sequence.
They've all improved to the degree where now you can actually operate at scale.
And then groups like ours have had to do a bunch of work in order to actually enable combinatorial perturbations, turning on more than just one gene at a time, which it turns out is much, much harder for the same reason we're just alluding to.
Imagine you're having trouble figuring out which one gene you put in the cell and turned on or off.
Now imagine you have to do that five times correctly in a row.
Well, if you start out with the original sort of performance of like you could detect roughly 50% of them, then the fraction of cells that would be correctly labeled is like one over two to the N, where N is the number of genes you're trying to detect.
And very quickly, it's like more of your data is mislabeled than it's labeled.
There's lots of technical reasons like this that have gotten worked out over time.
Only now are we really able to scale up where we're able to run experiments that are in the millions of cells in just a single day at, for instance, a small company like New Limit.