Hadoop for Large Digital Libraries

By Roy Tennant on February 15, 2012

Digital libraries will increasingly need to deal with massive amounts of information if they are to serve their clienteles well. Some libraries are already delving in data curation, and once you go down that road the sky’s the limit. Thankfully, there are new ways of dealing with massive amounts of data that have been pioneered by Google and others.

One such way is the growing ecology of software tools that all fall under the “Hadoop” umbrella. In an excellent overview article by O’Reilly and Associates, these various software tools are named and briefly explained (see graphic from the article).

At first, the sheer number of named tools seems daunting, but some are simply there to be used, and mostly without even realizing it. Some you may never need.

We are adopting these technologies at my place of employment (OCLC) and I expect to be learning more about them and how to use them in the coming weeks and months. I’ll share what I learn here, or over on the OCLC blog where I write as well, hangingtogether.org.

Meanwhile, think about getting down and dirty with Pig. I was born an Indiana farm boy, and I can’t believe my adult self, as a high-falutin’ digital librarian, found a way to write that sentence with a straight face.

My thanks to my colleague Lorcan Dempsey for bringing this post to my attention.

Hadoop for Large Digital Libraries

Search the Shift

Recent & Popular

Advertisement

Job Zone

On Twitter

On Facebook

About the Shift

Hadoop for Large Digital Libraries

Search the Shift

Recent & Popular

Advertisement

Job Zone

Tag the Shift

On Twitter

On Facebook

About the Shift