June 23, 2018

Roy Tennant About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.


  1. Kelly Brannock says:

    The State Library of NC is using volunteers to transcribe digitized documents. Details here: http://statelibrary.ncdcr.gov/digital/ncfamilyrecords/verticalfiles.html
    It’s an easy way to volunteer and volunteer efforts are making a difference in increasing access to local information.

  2. Claire Stewart says:

    University of Iowa Libraries are also using this for their Civil War Diaries project: http://diyhistory.lib.uiowa.edu/

    Iowa and California Digital Newspaper Project both did sessions on this at the most recent DLF (CDNP abstract: http://www.diglib.org/forums/2012forum/no-tempest-in-my-teapot-analysis-of-crowdsourced-data-and-user-experiences-at-the-california-digital-newspaper-collection/)

  3. Roy, you know I love you, but…

    All the above projects (including those in the comments) counted on considerable dedicated technical staff to design and implement the UI and get the data where the data needed to be. There’s your “why not” answer, right there.

    Now, tools are starting to exist that make this not quite such a slog. Scripto (http://scripto.org/) and T-PEN (http://t-pen.org/TPEN/) are two that I have my eye on; there’s also Islandora, which has had a TEI markup/correction tool for some time, if TEI is your thing.

    This is all to the good. Erasing the technical labor that’s still necessary to bring crowdsourcing projects to fruition? Not so good. Please don’t do it.

  4. Dorothea, No worries, you make an excellent point. It is no small thing to put this together technically, and I was wrong to not note that while wondering why more organizations don’t do it. Mea culpa, and kudos to those organizations who both have the technical chops and use them to these ends.

  5. I’m not sure whether this is a quibble with Dorothea or a violent agreement. While tools like Scripto and T-PEN make it possible to avoid the ongoing effort of systems like UIowa’s original design (which required a staff member to cut-and-paste each transcript from an email account to their CMS) or the development effort of building a transcription tool from scratch, there are still costs.

    What I tell people investigating the hosted version of my own FromThePage is that even though they don’t have to develop or install software, there is still a substantial amount of set-up work required for each set of documents they want to host. Their scanned images need to be scaled down from archival-quality TIFFs to something more reasonable, sometimes separated into recto/verso images, and uploaded or FTPd to the transcription tool. (And if they’re importing from a CMS like the Internet Archive, that work still needs to be done to get the images into that system.) The appropriate metadata needs to be entered, which usually includes document-specific text like desired transcription conventions which staff may not have experience with.

    More important than that, though is the ongoing work of the recruitment and motivation of volunteers. If volunteers report problems or ask questions about the images, the handwriting, or the software, someone from the institution needs to reply quickly, or else those volunteers will find a more responsive project to work on. This kind of community management work needs to be part of someone’s job on a near-daily basis, and so long as volunteers are still contributing–so long as the crowdsourcing project is successful–it’s doing to take work.

  6. I think it’s violent agreement, Ben. My answer tends more to the “why aren’t libraries/archives trying this?” whereas yours is closer to an answer to “why aren’t more crowdsourcing projects successful?”

    The answer to both questions is “labor,” but the exact nature of the labor differs in each case.