Joscha Bach
๐ค SpeakerAppearances Over Time
Podcast Appearances
For us, the world is only learnable because the adjacent frames are related, and we can afford to discard most of that information during learning. We basically take only in stuff that makes us more coherent, not less coherent. And our neural networks are willing to look at data that is not making the neural network coherent at first, but only in the long run.
For us, the world is only learnable because the adjacent frames are related, and we can afford to discard most of that information during learning. We basically take only in stuff that makes us more coherent, not less coherent. And our neural networks are willing to look at data that is not making the neural network coherent at first, but only in the long run.
By doing lots and lots of statistics, eventually patterns become visible and emerge. And our mind seems to be focused on finding the patterns as early as possible.
By doing lots and lots of statistics, eventually patterns become visible and emerge. And our mind seems to be focused on finding the patterns as early as possible.
By doing lots and lots of statistics, eventually patterns become visible and emerge. And our mind seems to be focused on finding the patterns as early as possible.
Yes, it's a slightly different paradigm, and it leads to much faster convergence. So we only need to look at a tiny fraction of the data to become coherent. And of course, we do not have the same richness as our trained models. We will not incorporate the entirety of text in the internet and be able to refer to it and have all this knowledge available and being able to confabulate over it.
Yes, it's a slightly different paradigm, and it leads to much faster convergence. So we only need to look at a tiny fraction of the data to become coherent. And of course, we do not have the same richness as our trained models. We will not incorporate the entirety of text in the internet and be able to refer to it and have all this knowledge available and being able to confabulate over it.
Yes, it's a slightly different paradigm, and it leads to much faster convergence. So we only need to look at a tiny fraction of the data to become coherent. And of course, we do not have the same richness as our trained models. We will not incorporate the entirety of text in the internet and be able to refer to it and have all this knowledge available and being able to confabulate over it.
Instead, we have a much, much smaller part of it that is more deliberately built. And to me, it would be fascinating to think about how to build such systems. It's not obvious that they would necessarily be more efficient than us on a digital substrate, but I suspect that they might.
Instead, we have a much, much smaller part of it that is more deliberately built. And to me, it would be fascinating to think about how to build such systems. It's not obvious that they would necessarily be more efficient than us on a digital substrate, but I suspect that they might.
Instead, we have a much, much smaller part of it that is more deliberately built. And to me, it would be fascinating to think about how to build such systems. It's not obvious that they would necessarily be more efficient than us on a digital substrate, but I suspect that they might.
So I suspect that the actual AGI that is going to be more interesting is going to use slightly different algorithmic paradigms or sometimes massively different algorithmic paradigms than the current generation of transformer-based learning systems.
So I suspect that the actual AGI that is going to be more interesting is going to use slightly different algorithmic paradigms or sometimes massively different algorithmic paradigms than the current generation of transformer-based learning systems.
So I suspect that the actual AGI that is going to be more interesting is going to use slightly different algorithmic paradigms or sometimes massively different algorithmic paradigms than the current generation of transformer-based learning systems.
My main issue is I think that they're quite ugly and brutalist.
My main issue is I think that they're quite ugly and brutalist.
My main issue is I think that they're quite ugly and brutalist.
Yes, they are basically brute forcing the problem of thought. And by training this thing with looking at instances where people have thought and then trying to deepfake that. And if you have enough data, the deepfake becomes indistinguishable from the actual phenomenon. And in many circumstances, it's going to be identical.
Yes, they are basically brute forcing the problem of thought. And by training this thing with looking at instances where people have thought and then trying to deepfake that. And if you have enough data, the deepfake becomes indistinguishable from the actual phenomenon. And in many circumstances, it's going to be identical.
Yes, they are basically brute forcing the problem of thought. And by training this thing with looking at instances where people have thought and then trying to deepfake that. And if you have enough data, the deepfake becomes indistinguishable from the actual phenomenon. And in many circumstances, it's going to be identical.