Nathan Lambert
๐ค PersonAppearances Over Time
Podcast Appearances
They didn't hit the YOLO run, whatever the reason was. Maybe they weren't as skilled. You can call it luck if you want, but at the end of the day, it's skill.
They didn't hit the YOLO run, whatever the reason was. Maybe they weren't as skilled. You can call it luck if you want, but at the end of the day, it's skill.
They didn't hit the YOLO run, whatever the reason was. Maybe they weren't as skilled. You can call it luck if you want, but at the end of the day, it's skill.
I think it's even more impressive what OpenAI did in 2022. At the time, no one believed in mixture of experts models at Google, who had all the researchers. OpenAI had such little compute. And they devoted all of their compute for many months, right?
I think it's even more impressive what OpenAI did in 2022. At the time, no one believed in mixture of experts models at Google, who had all the researchers. OpenAI had such little compute. And they devoted all of their compute for many months, right?
I think it's even more impressive what OpenAI did in 2022. At the time, no one believed in mixture of experts models at Google, who had all the researchers. OpenAI had such little compute. And they devoted all of their compute for many months, right?
All of it, 100% for many months to GPT-4 with a brand new architecture with no belief that, hey, let me spend a couple hundred million dollars, which is all of the money I have on this model, right? That is truly YOLO, right? Now, you know, people are like, all these like training run failures that are in the media, right? It's like, okay, great.
All of it, 100% for many months to GPT-4 with a brand new architecture with no belief that, hey, let me spend a couple hundred million dollars, which is all of the money I have on this model, right? That is truly YOLO, right? Now, you know, people are like, all these like training run failures that are in the media, right? It's like, okay, great.
All of it, 100% for many months to GPT-4 with a brand new architecture with no belief that, hey, let me spend a couple hundred million dollars, which is all of the money I have on this model, right? That is truly YOLO, right? Now, you know, people are like, all these like training run failures that are in the media, right? It's like, okay, great.
But like, actually a lot, a huge chunk of my GPs are doing inference. I still have a bunch doing research constantly. And yes, my biggest cluster is training, but like on, on this YOLO run, but like that YOLO run is much less risky than like what opening I did in 2022 or maybe what deep seek did now, or, you know, like sort of like, Hey, we're just going to throw everything at it.
But like, actually a lot, a huge chunk of my GPs are doing inference. I still have a bunch doing research constantly. And yes, my biggest cluster is training, but like on, on this YOLO run, but like that YOLO run is much less risky than like what opening I did in 2022 or maybe what deep seek did now, or, you know, like sort of like, Hey, we're just going to throw everything at it.
But like, actually a lot, a huge chunk of my GPs are doing inference. I still have a bunch doing research constantly. And yes, my biggest cluster is training, but like on, on this YOLO run, but like that YOLO run is much less risky than like what opening I did in 2022 or maybe what deep seek did now, or, you know, like sort of like, Hey, we're just going to throw everything at it.
DeepSeq is very interesting. This is where it's second to take us to zoom out out of who they are, first of all, right? High Flyer is a hedge fund that has historically done quantitative trading in China as well as elsewhere. And they have always had a significant number of GPUs, right?
DeepSeq is very interesting. This is where it's second to take us to zoom out out of who they are, first of all, right? High Flyer is a hedge fund that has historically done quantitative trading in China as well as elsewhere. And they have always had a significant number of GPUs, right?
DeepSeq is very interesting. This is where it's second to take us to zoom out out of who they are, first of all, right? High Flyer is a hedge fund that has historically done quantitative trading in China as well as elsewhere. And they have always had a significant number of GPUs, right?
In the past, a lot of these high frequency trading algorithmic quant traders used FPGAs, but it shifted to GPUs definitely. And there's both, right? But GPUs especially and High Flyer, which is the hedge fund that owns DeepSeek and everyone who works for DeepSeek is part of High Flyer to some extent, right? Same parent company, same owner, same CEO.
In the past, a lot of these high frequency trading algorithmic quant traders used FPGAs, but it shifted to GPUs definitely. And there's both, right? But GPUs especially and High Flyer, which is the hedge fund that owns DeepSeek and everyone who works for DeepSeek is part of High Flyer to some extent, right? Same parent company, same owner, same CEO.
In the past, a lot of these high frequency trading algorithmic quant traders used FPGAs, but it shifted to GPUs definitely. And there's both, right? But GPUs especially and High Flyer, which is the hedge fund that owns DeepSeek and everyone who works for DeepSeek is part of High Flyer to some extent, right? Same parent company, same owner, same CEO.
They had all these resources and infrastructure for trading, and then they devoted a humongous portion of them to training models, both language models and otherwise, right? Because these techniques were heavily AI-influenced.
They had all these resources and infrastructure for trading, and then they devoted a humongous portion of them to training models, both language models and otherwise, right? Because these techniques were heavily AI-influenced.