Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now, there are other stages like post-training or there are new types of reasoning models. And in all of those cases that we've measured, we see similar types of scaling laws.
Now, there are other stages like post-training or there are new types of reasoning models. And in all of those cases that we've measured, we see similar types of scaling laws.
So in my previous career as a biophysicist, so I did physics undergrad and then biophysics in grad school. So I think back to what I know as a physicist, which is actually much less than what some of my colleagues at Anthropic have in terms of expertise in physics.
So in my previous career as a biophysicist, so I did physics undergrad and then biophysics in grad school. So I think back to what I know as a physicist, which is actually much less than what some of my colleagues at Anthropic have in terms of expertise in physics.
So in my previous career as a biophysicist, so I did physics undergrad and then biophysics in grad school. So I think back to what I know as a physicist, which is actually much less than what some of my colleagues at Anthropic have in terms of expertise in physics.
there's this concept called the 1 over f noise and 1 over x distributions, where often, you know, just like if you add up a bunch of natural processes, you get a Gaussian. If you add up a bunch of kind of differently distributed natural processes. If you like, take a probe and hook it up to a resistor. The distribution of the thermal noise in the resistor goes as one over the frequency.
there's this concept called the 1 over f noise and 1 over x distributions, where often, you know, just like if you add up a bunch of natural processes, you get a Gaussian. If you add up a bunch of kind of differently distributed natural processes. If you like, take a probe and hook it up to a resistor. The distribution of the thermal noise in the resistor goes as one over the frequency.
there's this concept called the 1 over f noise and 1 over x distributions, where often, you know, just like if you add up a bunch of natural processes, you get a Gaussian. If you add up a bunch of kind of differently distributed natural processes. If you like, take a probe and hook it up to a resistor. The distribution of the thermal noise in the resistor goes as one over the frequency.
It's some kind of natural convergent distribution. And I think what it amounts to is that if you look at a lot of things that are produced by some natural process that has a lot of different scales, right? Not a Gaussian, which is kind of narrowly distributed.
It's some kind of natural convergent distribution. And I think what it amounts to is that if you look at a lot of things that are produced by some natural process that has a lot of different scales, right? Not a Gaussian, which is kind of narrowly distributed.
It's some kind of natural convergent distribution. And I think what it amounts to is that if you look at a lot of things that are produced by some natural process that has a lot of different scales, right? Not a Gaussian, which is kind of narrowly distributed.
But, you know, if I look at kind of like large and small fluctuations that lead to electrical noise, they have this decaying 1 over X distribution. And so now I think of like parallelism. patterns in the physical world, right? Or in language. If I think about the patterns in language, there are some really simple patterns. Some words are much more common than others, like the.
But, you know, if I look at kind of like large and small fluctuations that lead to electrical noise, they have this decaying 1 over X distribution. And so now I think of like parallelism. patterns in the physical world, right? Or in language. If I think about the patterns in language, there are some really simple patterns. Some words are much more common than others, like the.
But, you know, if I look at kind of like large and small fluctuations that lead to electrical noise, they have this decaying 1 over X distribution. And so now I think of like parallelism. patterns in the physical world, right? Or in language. If I think about the patterns in language, there are some really simple patterns. Some words are much more common than others, like the.
Then there's basic noun-verb structure. Then there's the fact that nouns and verbs have to agree, they have to coordinate. And there's the higher level sentence structure. Then there's the thematic structure of paragraphs. And so the fact that there's this regressing structure, you can imagine that as you make the networks larger, for
Then there's basic noun-verb structure. Then there's the fact that nouns and verbs have to agree, they have to coordinate. And there's the higher level sentence structure. Then there's the thematic structure of paragraphs. And so the fact that there's this regressing structure, you can imagine that as you make the networks larger, for
Then there's basic noun-verb structure. Then there's the fact that nouns and verbs have to agree, they have to coordinate. And there's the higher level sentence structure. Then there's the thematic structure of paragraphs. And so the fact that there's this regressing structure, you can imagine that as you make the networks larger, for
First, they capture the really simple correlations, the really simple patterns, and there's this long tail of other patterns. And if that long tail of other patterns is really smooth, like it is with the 1 over F noise in physical processes like resistors, then you can imagine as you make the network larger, it's kind of capturing more and more of that distribution.
First, they capture the really simple correlations, the really simple patterns, and there's this long tail of other patterns. And if that long tail of other patterns is really smooth, like it is with the 1 over F noise in physical processes like resistors, then you can imagine as you make the network larger, it's kind of capturing more and more of that distribution.
First, they capture the really simple correlations, the really simple patterns, and there's this long tail of other patterns. And if that long tail of other patterns is really smooth, like it is with the 1 over F noise in physical processes like resistors, then you can imagine as you make the network larger, it's kind of capturing more and more of that distribution.