Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Mark Graham

πŸ‘€ Speaker
141 total appearances

Appearances Over Time

Podcast Appearances

Today, Explained
Breaking the internet

I do, because I don't know. And one really can't know without the benefit of the long historical view. And that's not something that we have access to today. Why? Because we don't have a real time machine.

Today, Explained
Breaking the internet

I do, because I don't know. And one really can't know without the benefit of the long historical view. And that's not something that we have access to today. Why? Because we don't have a real time machine.

Today, Explained
Breaking the internet

I do, because I don't know. And one really can't know without the benefit of the long historical view. And that's not something that we have access to today. Why? Because we don't have a real time machine.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

The Wayback Machine is a service of the Internet Archive that is used to provide a time machine to the web. We have been archiving much of the public web for nearly three decades now, and we make those archives available through the Wayback Machine.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

The Internet Archive is a nonprofit organization with a mission of universal access to all knowledge. We pursue that mission in a variety of ways, including archiving, as I said, much of the public web. We work toward acquiring and digitizing and preserving and organizing and making available a whole range of material that is kind of grouped into media types.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

So one might be books, for example, and we digitize more than 4,000 books every day. Or television news. We archive television news, both from the United States and for other countries around the world. journal articles. We have a collection of more than 30 million publicly accessible journal articles available from scholar.archive.org.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

78s, those old things on shellac, we've got hundreds of thousands of those that we have digitized. Those were donated to us by the Boston Public Library. So I could go on and on. We identify media, recording media that people have been publishing in for some period of time.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

If it's digital, like born digital, then that makes life easier because we're able to then capture that material in some fashion on our hard drives and preserve it. But maybe it's analog, maybe it's paper or microfiche or microfilm or vinyl or shellac, as I said. In that case, we have to first digitize the material, in some cases using the Stoke hardware and software setups that we have developed.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

And once we've digitized it, then we can preserve it and organize it and make it available. At the end of the day, this is what this is about. This is about the voices of humanity expressed in a variety of medium that in many cases are being stored and made available on a series of platforms that are inherently ephemeral. that have a history of disappearing. One of the terms is link rot.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

That's talking about the material that may have been available at a given URL and a given address on the web at a given point in time is no longer there. You go to that URL and one of two things are going to be true. Well, three things, I guess. The first thing is that what you're looking for is their success.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

But the second is that you get a page not found or some other error message, a 500 error message, maybe something like that on the server end. So you just can't get the material. It's just no longer there at that URL. Now, that material might be available via another URL. It may have been moved somewhere, but you may not necessarily know that if there's no redirect in place.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

But the other thing that can happen is that at that same URL, there may be different material. That's referred to as content drift. Same URL, different material. Well, how would you even know what the prior material was or that there even was prior material at that URL? You wouldn't. Why? Because there's no version control system for the web.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

I go to a URL, I may get something, and then five minutes later, I go to the same URL, I may get that same thing, or I may get nothing, or I may get something different. And it just is. It is what it is at any given moment. That's what the web primarily is. There are exceptions to this, of course. There are applications on the web, like Wikipedia, for example.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

which is fundamentally based on a version control system. And you can go back and you can see all the various representations of what was available from a given URL. But for the web overall, it's not like that. And so that's where the Wayback Machine steps in. That's where we provide a time-based view for...

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

for URLs that we have been able to access and that we've been able to archive and then organize and make available to our patrons.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

Where to start? I mean, those are some big questions. A very general statement that I can make is that about a third of the old web measured in, say, 10 or 15 years or something like that is gone. So about a third. In some cases, it's less, and in some cases, it's more. And certainly for an individual website that may have had millions of pages, like GeoCities, for example, It's 100% gone, right?

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

So it's just not there on the live web. But it turns out that in more than two-thirds of the cases that we've looked at where a given URL is no longer available, it is available through the Wayback Machine. So one way of looking at that is saying that instead of saying that maybe a third of the old web is gone, maybe a ninth of the old web is gone.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

And once again, these are very broad generalizations because much of that material was backed up and can be accessed through web.archive.org from the Wayback Machine. But you asked a different question.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

I don't know. They're getting different, right? So things are changing.

Decoder with Nilay Patel
How the Wayback Machine is fighting linkrot

So first of all, let's look a little bit like why things go away. There are very benign reasons why things go away. Maybe a company has simply gone out of business or a government has changed. And so there's a new administration. And so you would expect if a company goes out of business, what entity would want to keep that company's website alive, for example, or a publication, right?