One Format to rule them all, One Format to find them; One Format to bring them all and in the darkness bind them. – with apologies to J.R.R. Tolkien
It is now over 12 years since I wrote “MARC Must Die” in Library Journal. At the time that I wrote it, I think that I imagined a much redesigned metadata format expressed in XML. But it didn’t take long for me to realize the error of my ways. Not that we didn’t need to do something, but I was wrong to think that it required replacing. What it really required, I soon realized, was for us to not rely upon it solely. And that is a point that I feel has become lost in our discussions about our bibliographic future.
Here is how I put it in a follow-up piece in Library Hi Tech titled “A Bibliographic Infrastructure for the Twenty-First Century”:
What I am suggesting [in this article] is different in scope and structure than is implied by my “MARC Must Die” column in Library Journal, although I alluded to it in the follow-up “MARC Exit Strategies” column. What must die is not MARC and AACR2 specifically, despite their clear problems, but our exclusive reliance upon those components as the only requirements for library metadata. If for no other reason than easy migration, we must create an infrastructure that can deal with MARC (although the MARC elements may be encoded in XML rather than MARC codes) with equal facility as it deals with many other metadata standards. We must, in other words, assimilate MARC into a broader, richer, more diverse set of tools, standards, and protocols. The purpose of this article is to advance the discussion of such a possibility.
I went on to explain a number of characteristics that I felt our bibliographic infrastructure should support as well as a fairly specific proposal on implementation. However, despite being awarded for being the best article to appear in Library Hi Tech for that year, that salvo basically landed on deaf ears.
And now we are here.
“Here”, being, of course, that the Library of Congress is developing a new format.
I parse a lot of data. I even fancy myself to be a Data Geek. After all of the data processing I’ve done I’ve come to realize that there are really only three things I care about in terms of metadata: parseability, granularity, and consistency. Pretty much everything else can be dealt with. You call your author field “creator”? Fine and dandy. You record dates as MM/DD/YYYY? I can deal with that. So long as your metadata is:
- Parseable. Separate fields must be delimited in some way. It doesn’t need to be XML, it can be a JSON array or a pipe symbol (“|”) or even, in many cases, a tab (I process a lot of tab-delimited text files that are saved out of Excel, for example). But there must be some way of determining via software what has been kept separate.
- Granular. If I need first names separate from family names I want them in separate fields. Trying to break apart elements you need to be separate can be difficult, especially if the data is inconsistent. Oh, and by the way, punctuation (even ISBD punctuation) doesn’t count.
- Consistent. When processing data, inconsistency can cause a lot of problems. Even if a mistake is made, it’s best to make it consistently so the person processing it can treat all records the same. What is difficult is having to accommodate a wide variety of edge cases.
That’s really all I care about, since it is very unlikely that every library will create their own format. No, we are herd animals, so we will gather around a very small number of formats, and perhaps only one. After all, that is all we have ever known.
Photo by David Fulmer, Creative Commons License Attribution 2.0 Generic.