David Baker
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so when I first began at the University of Washington, we studied how proteins actually fold up from their amino acid sequences to their three dimensional structures.
And we studied that experimentally.
And as we began to learn more and more, we developed computer programs to mimic that process and to try to be able to go from, take a sequence and predict the structure.
And then after we had been doing that for some time, we realized that we could go backwards, not go from, like in biology, from the sequence to the structure and the function of the protein, but instead start with a new structure and a new function that don't exist and work backwards towards an amino acid sequence that would encode that new protein.
The difference is that in the biology case, the proteins are encoded in the genes in our genomes and the genes of all living things.
In the design case, it's a completely new protein.
So it doesn't exist.
There's no gene that exists.
So we have to make a synthetic gene, a synthetic piece of DNA that encodes this new protein.
Once we have that synthetic piece of DNA, we can put it into a bacterium and it will produce the protein and we can see whether it actually does what we designed it to do.
So the first class of models we developed were traditional physical models where we sort of tried to describe all the interactions between all the atoms in the protein and how those interactions guide the protein to fold up.
And we made quite a bit of progress, some of which I briefly described in my 2019 talk.
Now, since then, we've completely switched over to developing AI-based methods for protein design.
And in these methods, we take the many, many examples of proteins whose structures have been determined by scientists really over the last 50 years.
And there are about 250,000 of these structures now.
And so we can learn by training AI models on these structures.
We can develop methods that actually will generate new proteins with new structures.
And we can condition this process
on the on a specification of the function we want to create so for example with and our methods are very similar to image generation methods so whereas you might say um to dolly or an image image generation program uh generate an image of a giraffe walking on a horse or something absurd like that and you would get an image something like that
that represents that in the same way we can specify to rf diffusion the design protein program we have created we can say design a protein which binds to this virus and blocks it or binds to this cancer cell and stops it from dividing and the program will generate a new protein and then we make it in the lab and see whether it actually does what we designed it to do