Charlotte Bunne
👤 PersonAppearances Over Time
Podcast Appearances
So we need to be able to simulate the change of a perturbation on a cellular phenotype.
So on the internal representation, the universal representation of a cell state.
We need to simulate the effect a mutation has downstream and how this would propagate in our representations upstream.
And we need to build many different type of virtual instruments that allow us to basically
design and build all those capabilities that ultimately the AI virtual cell needs to possess that will then allow us to reason, to generate hypothesis, to basically predict the next experiment to conduct, to predict the outcome of a perturbation experiment, to in silico design cellular states, molecular states, things like that.
So, and this is why we make the separation between like internal representation as well as those instruments that operate on those representations.
So, I mean, on the one hand, of course, we benefit profits and inherit from all the tremendous efforts that have been made in the last decades on assembling those data sets that are very, very standardized, right?
SellXGene is like very, like somehow AI friendly, as you can say it, right?
Like it is somewhat a platform that is easy to feed into algorithms.
But at the same time, we actually also see really new,
building mechanisms, design principles of AI algorithms in itself.
So I think we have understood that in order to really make progress, build those systems that work well, we need to build AI tools that are designed for biological data.
So to give you...
an easy example, right?
Like if I use a large language model on text, it's not going to work out of the box for DNA because we have different reading directions, different context lens, and many, many, many, many more.
And if I look at standard computer vision, right, where we can say AI really excels and I'm applying standard computer vision, vision transformers on multiplex images,
They are not going to work because normal computer vision architectures, they always expect the same three inputs, RGB, right?
In multiplex images, I'm measuring up to 150 proteins potentially in a single experiment, but every study will measure different proteins.
So I deal with many different scales, right, like larger scales than I used to.
attention mechanisms that we have in usual computer vision transformers are not going to work anymore.