Alex Reisner
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
I think it is potentially the most important aspect of a model is what it's trained on.
I mean, if you take a model and you train it on
Let's say it generates music and you train it on 1950s jazz.
That model will be very good at generating music that sounds a lot like 1950s jazz.
If you train it on recent hip-hop, it's going to generate music that sounds like recent hip-hop.
These models have names like ChatGPT and Claude, but I think you could make an argument that
The right name for a model is actually the description of the data it was trained on, because that is a description of its capabilities.
And so I think that the training data is really fundamental to the model, maybe more than the architecture to some degree.
have argued that they need to keep this secret because the data that they have selected to train on is their competitive advantage, right?
Like Anthropic has done a better job at selecting data than Google and OpenAI.
And if they were to let that come out,
in a court case or be public in some way, they would lose their competitive advantage.
There's another pretty obvious reason, which is that they have gone about acquiring a lot of this data in ways that the people who've created the data, the authors of the books and the creators of the videos and the music would not be happy about.
And in a lot of cases, they just don't know that their work is being used.
when they find out they're not happy about it.
And I think it's a conversation that the AI companies have tried to just avoid having.