April 13, 2021

Back to the Wayback Machine

A little over a year ago, I pointed out that as far as I could tell the Internet Archive’s Wayback Machine appeared to be somewhat moribund. At the time, I couldn’t find any major web site that had been updated  more than a year-and-a-half prior.

Well, as my friend Tracy Seneca pointed out a while back, the Wayback Machine appears to have received an update to both data and interface. She called it a “spiffy” new interface, and I’d have to agree. It seems intuitive, informative, and useful. For example, take a look at the result for http://whitehouse.gov/ .  Although there is clearly a lag in data of about six months, that is typical and shouldn’t be much of a problem for what is supposed to be an historical record rather than up-to-the minute.

That’s the good news. The bad news is that web sites in the backwaters of the Interwebs may not be crawled as much or perhaps even at all. Take my own web site as an example. It hasn’t been crawled since June 2009, which by my reckoning is close to 2 years ago. Not that I blame them, mind you, as who is all that interested in the minutiae of my life? But that means that any claims to be “archiving the web” should be taken with a grain of salt. Maybe say “archiving the parts of the web that matter” or “ignoring what doesn’t matter so much”. You get the drift.

But in the end this is yet another mea culpa moment. I’m happy that I was wrong that the Internet Archive was not maintaining the Wayback Machine and I apologize for casting aspersions on their abilities to keep the service alive. It’s there, and being updated, even if spotty in places.

Share
Roy Tennant About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.

Comments

  1. I’m seeing either May 2009 or June 2009 as the latest update for all three of my sites (Cites & Insights, Walt at Random and my personal site). I draw no conclusions.

  2. You never know Roy “who is all that interested in the minutiae of my life?”… :-)

  3. And my site had dip in coverage in 2009 but was picked up three times in 2010. I can’t draw any conclusions either.

  4. Karen Coyle says:

    “Crawled” and “loaded into Wayback machine” are two different things. There is a lag between crawling and loading. Which, of course, means you don’t know if you’ve been crawled recently. Here’s the FAQ on that:

    http://www.archive.org/about/faqs.php#103