Steve Quake
👤 PersonAppearances Over Time
Podcast Appearances
And, you know, a couple of thoughts on that maybe.
You know, Theo Karaletsos, our lead AI person at CZI says, machine learning is the formalism through which we understand high dimensional data.
And I think that's a very deep statement.
And, you know, biological systems are intrinsically very high dimensional.
You've got 20,000 genes in the human genome in these cell atlases.
You're measuring all of them at the same time in each single cell.
And there's a lot of structure in the relationships of their gene expression there that is just not evident to the human eye.
um and uh for example cell by gene our database um that collects all the the aggregates all of the single cell transcriptomic data is now over 100 million cells um and uh as you mentioned we're seeing ways to increase that by an order of magnitude in the near future
um the project that yuri leskovich and i worked on together that charlotte referenced earlier was like a first attempt to build a foundational model on that data to discover some of the correlations and structure that was there and so with a subset i think it was the 20 or 30 million cells we built a large language model
and began asking it, what do you understand about the structure of this data?
And it kind of discovered lineage relationships without us teaching it.
We trained it on a matrix of numbers, no biological information there.
And it learned a lot about the relationships between cell type and lineage.
And that emerged from that high dimensional structure, which was super pleasing to us.
And really, I mean, for me personally, gave me the confidence to say, this stuff is going to work out.
There is a future for the virtual cell.
It's not some made up thing.
There is real substance there.
And this is worth investing an enormous amount of CZI resources in going forward and trying to rally the community around as a project.
You know, we don't know.