December 4, 2025

GIGO

A couple things happened recently that reminded me of the old chestnut “garbage in, garbage out”.

One was that after I spent hours trying to debug why my Perl program was not producing the output I expected, it finally occurred to me (after sleeping on it) that I should check the input file. Sure enough, it wasn’t as I expected it to be. I had run a utility to clean up the difference in line endings after transferring it from a Windows machine to the Mac, but it had also introduced additional blank lines that screwed me up. Garbage in, and even worse garbage out.

Another incident was someone pointing out the wonderful “LinkSailor” web application that Ian Davis put together to demonstrate the power of linked data. Give it a URI and it will fetch the data, parse the links, and allow you to surf those links to other data stores. It’s certainly the best illustration I’ve seen of the potential of linked data.

So…excited by the possibilities and having noticed that many of the examples used dbPedia links, which are basically just linked data versions of Wikipedia URS, I grabbed the Wikipedia URI for San Francisco and plugged it in. Looking good, looking good — wait! What does New Jersey have to do with San Francisco?

Yes, that’s right, data about San Francisco and New Jersey mashed up on the same page. I still haven’t parsed out where the links went wrong, but again an example of garbage in, garbage out. Systems that use data, and most assuredly our linked data, will only be as good as the data itself, and the linkages made.

Since we’re still very much in the early days of linked data, we would be wise to consider lessons such as these, and create our links with thought and care.

Share
Roy Tennant About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.