Alex Reisner

👤 Speaker

153 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1

Confidence: Medium

Appearances Over Time

Podcast Appearances

The Vergecast

How to train your data

They probably have the strongest argument that their data could be used for other purposes.

903.721 View full episode →

The Vergecast

How to train your data

But when you go back, they've been cited by over 10,000 papers.

907.946 View full episode →

The Vergecast

How to train your data

I didn't read all 10,000, but I read a lot of them.

913.251 View full episode →

The Vergecast

How to train your data

And they are mostly AI.

915.694 View full episode →

The Vergecast

How to train your data

And it's early.

918.417 View full episode →

The Vergecast

How to train your data

A lot of it is stuff that people wouldn't mind as much as with generative AI.

919.678 View full episode →

The Vergecast

How to train your data

Common Crawl, I think without Common Crawl,

926.025 View full episode →

The Vergecast

How to train your data

you know, AI translation tools might not be as good as they are.

929.675 View full episode →

The Vergecast

How to train your data

I think it was really a huge help because they scraped web pages, the same page in multiple languages, and people were able to train translation models based on that.

933.359 View full episode →

The Vergecast

How to train your data

So that was helpful.

942.269 View full episode →

The Vergecast

How to train your data

But the thing that, you know, there is still a, what I would call a data laundering network where the AI companies are still relying on

944.412 View full episode →

The Vergecast

How to train your data

they'll do a collaboration with the university and they'll have universe the university download you know millions of images to train a model or download millions of articles to train a model and the ai company can say like well we didn't do it this was like an academic thing um you know the same goes common crawl is not the only non-profit that's like doing a lot of this scraping for the ai industry one of the data sets i reported on in the music the article about music

958.193 View full episode →

The Vergecast

How to train your data

Training Data is this organization based in Europe called Lyon.

986.189 View full episode →

The Vergecast

How to train your data

They have a data set of 12 million songs from YouTube.

990.135 View full episode →

The Vergecast

How to train your data

So anyway, this is like, is it academic?