Digital libraries will increasingly need to deal with massive amounts of information if they are to serve their clienteles well. Some libraries are already delving in data curation, and once you go down that road the sky’s the limit. Thankfully, there are new ways of dealing with massive amounts of data that have been pioneered by Google and others.
One such way is the growing ecology of software tools that all fall under the “Hadoop” umbrella. In an excellent overview article by O’Reilly and Associates, these various software tools are named and briefly explained (see graphic from the article).
At first, the sheer number of named tools seems daunting, but some are simply there to be used, and mostly without even realizing it. Some you may never need.
We are adopting these technologies at my place of employment (OCLC) and I expect to be learning more about them and how to use them in the coming weeks and months. I’ll share what I learn here, or over on the OCLC blog where I write as well, hangingtogether.org.
Meanwhile, think about getting down and dirty with Pig. I was born an Indiana farm boy, and I can’t believe my adult self, as a high-falutin’ digital librarian, found a way to write that sentence with a straight face.
My thanks to my colleague Lorcan Dempsey for bringing this post to my attention.