Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The Peterman Pod

Google DeepMind Pre-Training Lead: How To Land a Job at a Frontier Lab | Vlad Feinberg

15 Jun 2026

Transcription

Transcript generated automatically by AI and may contain errors.

Chapter 1: What is the main topic discussed in this episode?

0.031 - 6.099

Every single time you go up for a pre-training run, you're about to put in more flops into this run than you've ever done before.

0

Chapter 2: What skills do frontier labs like Google DeepMind require?

6.78 - 14.209 Vlad Feinberg

This is Vlad Feinberg. He's Google DeepMind's pre-training area lead, and I asked him all about how to get a job at a frontier lab.

0

14.59 - 32.335

That was a particular skill that I see voracious demand for across all the different labs. The research skill set is going to become increasingly important. If you do the scaling book exercises and send me a video of yourself doing them, I would love to interview you.

0

32.355 - 46.718 Ryan Peterman

Here's the full episode. You wrote this post that was titled, How to Get a Job at a Frontier Lab. What are the skills that are kind of in demand in Frontier Labs?

0

Chapter 3: How do AI research and engineering differ in frontier labs?

46.818 - 48.921 Ryan Peterman

Maybe we can talk about the shape of the work.

0

49.34 - 73.45

There's quite a range of different things that Frontier Labs require. At this point, LLMs are artifacts that are connected to research and product in ways that machine learning really hasn't been as connected to before. And so it really touches on so many different things.

0

74.191 - 103.258

The goal of my post was to propose just a couple of tangible directions in which labs could require a certain set of skills not to be fully exhaustive. And really the ones that I dive into have to do with kernel development and a low level engineering to accelerate the runtime for these LLMs in practice. And so that was a particular skill that I see

0

103.238 - 134.595

demand for across all the different labs and among different projects within the labs. So that seemed like a very sharp one to call out as an overall need. And so specifically, whenever we're doing a research project that involves changing the architecture for the neural net in a particular way, or rethinking how we might do serving to do better KV caching or something like that.

0

135.453 - 141.139

again, across the stack, you just need to be able to implement these new techniques in efficient ways.

Chapter 4: What domains are crucial for success in frontier AI research?

142 - 165.568

And the inner loop of all of these different changes is creating software artifacts that can function at large scales with high throughput, low latency. And this is just fundamental work that's tied to classical backend engineering thinking. So yeah, it seemed like a very... open thing for people to specialize in.

0

166.509 - 183.156 Ryan Peterman

My friends that work at OpenAI and Anthropic, there's this distinction of an applied org and the research org. And I was wondering if DeepMind has a similar distinction and if you could speak about what that difference is.

0

183.777 - 207.746

So we have different focus areas and like For instance, within GDM, there's a team that focuses on how we can use our Gemini LLMs to better inform search results. And so that might be in some way an applied version of the LLMs.

0

Chapter 5: How can engineers effectively market themselves to frontier labs?

207.806 - 221.864

But I am hesitant to make a very sharp distinction here because there's so much... actual like hard research that has to go into this kind of level of product integration.

0

222.164 - 239.389

Like specifically for the one I mentioned, quite a lot of work goes into making sure that these LLMs are factual and can cite sources to have very precise grounded answers, assessing the quality of these sources to make sure that you're not referring to anything that's like sarcastic or a joke.

0

240.905 - 259.525

This is, I guess, a good example of how even in like product specific, quote unquote, applied AI verticals, you're still doing research. That being said, there's definitely what I would say is like very classical LLM research teams, pre-training, post-training.

0

Chapter 6: What concrete steps can engineers take to prepare for roles in frontier labs?

260.707 - 280.148

These are things that are still standalone teams inside of GDM that are focused on. what I would say is like, you know, creating soda models, you know, pure research. Again, the caveat is the pure research that we do, the extent that it matters is the extent to which we can realize it.

0

Chapter 7: What is the overview of pre-training areas in AI research?

280.569 - 304.862

And so we're just as responsible with delivering these models and making sure they train stably and actually being like the SREs of sorts for the training run to make sure that the model training is going smoothly as we are for coming up with the recipes to make these LLMs. and you can't separate those two roles, it's really crucial to kind of wear both of those hats.

0

305.243 - 319.005

So yeah, I think you can draw up a spectrum between research and applied, but no matter what in today's world, I think everyone needs to be fluid across that spectrum.

0

319.424 - 331.001 Ryan Peterman

I noticed there's also another spectrum of software engineer to pure AI researcher. And how do you think of that spectrum, like software engineering versus like AI researcher roles?

0

331.662 - 365.85

So I guess in, um, in, in my case specifically, I think a lot of what we do and a lot of the new techniques that we develop. the groundwork is laid in infrastructure investment. So I can walk through what my team does a little bit more detail later, but one of the verticals is distillation. And in order to do distillation, it's some way of...

0

Chapter 8: What are some memorable experiences from working on Gemini?

366.556 - 389.526

transferring the knowledge or some form of statistics about the underlying data set through a teacher model into the student model to make the student model better than if it hadn't ever seen these auxiliary statistics from the teacher. And when you're talking about statistics derived from a massive LLM applied to trillions and trillions of tokens,

0

389.506 - 425.014

you're talking about a level of flops investment that is millions and millions of dollars. And That, in turn, means that you have to be able to think through how do you optimize the system to be as efficient as possible? Because every operation that we're performing is multiplied by such a large factor that every second counts, every byte of storage counts, and quite a bit of that work is...

0

424.994 - 454.776

you know, good old fashioned software engineering. And so in particular, the infrastructure for distillation has evolved through maybe three to four generations at this point. And in each one, we've taken a step back, looked at what kind of research methods have we been applying for distillation holistically, thought about how do we broaden what the infrastructure is capable of.

0

455.698 - 478.356

And there's definitely a couple of discrete points where rethinking the system design of how we perform distillation enables us to do research on distillation methods much more quickly. And so it's this kind of investment that like, okay, this like four month or whatever rewrite of our distillation infrastructure,

0

478.336 - 495.096

then results in a dramatically new understanding of distillation scaling laws that translates to really strong models. So it really requires just work across the stack. And I, you know, I can't

495.076 - 521.742

yeah i can't imagine that we would have gotten results like flash 3.0 without having made those distillation infrastructure investments that are at the end of the day things that started with a good old-fashioned design doc and thinking about what the right abstractions are for generating these teacher statistics coming up with the right storage system for them thinking through what could support the reading and writing across multiple different data centers at this scale

522.717 - 524.901

really classical distributed systems problems.

525.442 - 548.64 Ryan Peterman

Yeah, I mean, it sounds like there's a lot of software engineering, backend, infra-type problems, given just the scale of the compute at this point. It still feels like, though, at some point in that spectrum, there's some crossover where there's these new skills, like somewhere where if you took an arbitrary backend engineer and you placed them

548.62 - 557.951 Ryan Peterman

to, I don't know, adjust the model architecture or something like that is like a bit of a jump more than the InfraWork. Like, how do you see that distinction?

Comments

There are no comments yet.

Please log in to write the first comment.