Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
In the 1960s, Ray Solomonoff formalized this idea into a theory of universal induction which we now call Solomonoff induction.
He defined the simplicity of a hypothesis as the length of the shortest program that can describe it, a concept known as Kolmogorov complexity.
An ideal Bayesian learner, according to Solomonoff, should prefer hypotheses, programs, that are short over ones that are long.
This learner can, in theory, learn anything that is computable, because it searches the space of all possible programs, using simplicity as its guide to navigate the infinite search space and generalize correctly.
The invention of Solomonov induction began a rich and productive subfield of computer science, algorithmic information theory, which persists to this day.
Solomonov induction is still widely viewed as the ideal or optimal self-supervised learning algorithm, which one can prove formally under some assumptions.
These ideas, or extensions of them like AI, XI, were influential for early deep learning thinkers like Eugen Schmidhuber and Schoenlig, and shaped a line of ideas attempting to theoretically predict how smarter-than-human machine intelligence might behave, especially within AI safety.
Unfortunately, despite its mathematical beauty, Solomonov induction is completely intractable.
Vanilla Solomonov induction is incomputable, and even approximate versions like speed induction are exponentially slow.
Theoretical interest in it as a platonic ideal of learning remains to this day, but practical artificial intelligence has long since moved on, assuming it to be hopelessly unfeasible.
Meanwhile, neural networks were producing results that nobody had anticipated.
This was not the usual pace of scientific progress, where incremental advances accumulate and experts see breakthroughs coming.
In 2016, most Go researchers thought human-level play was decades away.
AlphaGo arrived that year.
Protein folding had resisted 50 years of careful work.
AlphaFold essentially solved it over a single competition cycle.
Large language models began writing code, solving competition math problems, and engaging in apparent reasoning, capabilities that emerged from next token prediction without ever being explicitly specified in the loss function.
At each stage, domain experts, not just outsiders, were caught off guard.
If we understood what was happening, we would have predicted it.
We did not.