Casey Newton
👤 SpeakerAppearances Over Time
Podcast Appearances
Part of the reason you chose the cutoff date of around 1930 is because that's when works become public domain, anything after that is copyrighted.
Are there any other reasons you chose that specific point in time?
So I've been fascinated by seeing like what people are trying with this model.
People are having it make predictions, but also asking it about its favorite authors or its opinions of, you know, major historical figures.
What have been the experiments that have been most interesting to you?
Oh, absolutely.
Okay.
People have also pointed out that the model behind Taki seems to know about some things that it probably shouldn't know about, like the rise of Hitler, the presidency of FDR, things that didn't happen until after its data cut off.
Is that proof that there's been some kind of contamination of the training data with more recent data?
Oh, there's definitely contamination.
But, like, how is it getting that data if it's only being fed scanned OCRed books from archival sources?
I see.
I asked Taki what it knew about me, and it said, Kevin O'Hara...
which is not my name, was born in Dublin in 1840, and having been educated at the School of the Christian Brothers, became a teacher in it.
He afterwards adopted the profession of journalism and was for some years connected with the staff of the Nation newspaper.
It also said I had written several popular songs, including Molly Asthor and the Irish Immigrant.
Now, obviously, most of that is wrong, but it did connect me to journalism.
which I found interesting and maybe like some other evidence of some data contamination.
But like, is this thing accessing the internet in some way?
Or like, how would it have known that I, or at least Kevin O'Hara, this character sort of connected to me in the model, was a journalist?