Dwarkesh Patel

👤 Speaker

15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Oh, sorry, extremely naive question.

6688.285 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Why is there not a quadratic term?

6690.629 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what is the reason that there's no company which has over a million token context length?

6752.275 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If this is true?

6758.608 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so there's this idea that Dario said on the podcast and others have said, which is we don't need continual learning for

6789.743 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

AGI in context learning is enough.

6796.775 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And if you believe that, then you have to think that we had to get to 100 million token, 100 million billion context length to have an employee that is the equivalent to working with you for a month.

6798.638 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, maybe that's no longer true as far as attention or something.

6809.133 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But yeah, if you think that, then as some ML infer thing would have to change to allow for 100 million, like the memory bandwidth to allow for 100 million token context lengths.

6811.517 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Not because of the compute cost, but because of the memory bandwidth cost.

6867.114 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And why doesn't sparse attention solve it?

6884.948 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Why isn't the cost to retrieve HBM the memory bandwidth, or the bytes divided by memory bandwidth?

7114.001 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Because if it's already in HBM, you can be doing compute while you're getting it from HBM to HBM?

7129.642 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, for example.

7134.449 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay.

7136.051 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And the price difference, I think, was... I'll look it up.

7200.23 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so the base input tokens is $5 per million.

7203.635 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Togans.

7212.777 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Which means remap.

7213.919 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, that's five.

7214.761 View full episode →

← Previous Page 39 of 764 Next →

Report any issue