Cal Newport
๐ค SpeakerAppearances Over Time
Podcast Appearances
You're writing your Google searches in natural language, but no one's having a conversation with Google.
It's you list the keywords as quickly as possible.
And Google's pretty good at figuring out, you know, population, Spain, 1982.
And you press enter and you get that information.
You're not like, hey, so I'm wondering what the population is of Spain in 1982.
Can you help me find that question mark?
There's something odd about that anthropomorphized conversational interface.
I guess we saw a lot of Star Trek growing up.
And that's, you know, what we think the future is supposed to be like.
But it has all sorts of problems.
I think it's hard, actually.
I think it's actually hard to get a language model to do that, right?
Because if you think about...
When you go back to the base layer of what's happening in the pre-training is that you're building a language model that's trying to win at the token guessing game.
So I'm trying to guess what word or part of word actually comes next to what I assume to be a real piece of text.
And then if you do that autoregressively, so you call it again and again and again, adding the answer to the input so it grows out an answer, what you're going to get is text expansion.
You've given me a text that I'm trying to expand as if there was a real text that exists and I'm trying to match it.
You get that like kind of indirectly.
So really it's idiom
is the type of text it's trained on, which for the most part is more sort of prose style text.