Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Casey Newton

👤 Speaker
1607 total appearances

Appearances Over Time

Podcast Appearances

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Part of the reason you chose the cutoff date of around 1930 is because that's when works become public domain, anything after that is copyrighted.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Are there any other reasons you chose that specific point in time?

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

So I've been fascinated by seeing like what people are trying with this model.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

People are having it make predictions, but also asking it about its favorite authors or its opinions of, you know, major historical figures.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

What have been the experiments that have been most interesting to you?

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Oh, absolutely.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Okay.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

People have also pointed out that the model behind Taki seems to know about some things that it probably shouldn't know about, like the rise of Hitler, the presidency of FDR, things that didn't happen until after its data cut off.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Is that proof that there's been some kind of contamination of the training data with more recent data?

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Oh, there's definitely contamination.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

But, like, how is it getting that data if it's only being fed scanned OCRed books from archival sources?

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

I see.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

I asked Taki what it knew about me, and it said, Kevin O'Hara...

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

which is not my name, was born in Dublin in 1840, and having been educated at the School of the Christian Brothers, became a teacher in it.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

He afterwards adopted the profession of journalism and was for some years connected with the staff of the Nation newspaper.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

It also said I had written several popular songs, including Molly Asthor and the Irish Immigrant.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Now, obviously, most of that is wrong, but it did connect me to journalism.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

which I found interesting and maybe like some other evidence of some data contamination.

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

But like, is this thing accessing the internet in some way?

Hard Fork
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Or like, how would it have known that I, or at least Kevin O'Hara, this character sort of connected to me in the model, was a journalist?