As someone who works with large masses of MARC data on a regular basis (at my day job at OCLC Research), you’d better believe I see a lot of inconsistencies in MARC data. This happens for a variety of reasons. One huge issue is that quite a few MARC elements are free-text fields. This means you can put just about anything you want in them, and people do.
But there are also fields that do specify what to enter, and still inconsistencies creep in. There are many reasons for this as well, here is a beginning list:
- Rules that are inexact or difficult to understand.
- An unclear understanding, or an imperfect use (whether deliberate or inadvertent), of those rules.
- Typographical errors.
- Data acceptance systems (either single record or batch) that fail to validate appropriate elements.
- Violation of rules for local purposes (for example, putting data in a different element so it will display in a particular system; or adding HTML markup to elements for local display purposes).
- Arguments over interpretation of rules. For example, when a record describes an item, or a version of an item. What’s a “version”?
I’d like to assert that these problems are in our past, but I clearly cannot. Let’s take the 264 field for example. Recently created, these fields are now pouring into WorldCat (in Jan. we found 56,706 such fields and in April we found 158,019 — nearly three times as many). Meanwhile, the rules seem fairly specific about what one should do if the place of publication is not apparent: put “[Place of publication not identified] :” in the $a. Not any of these:
- [Place of publication not identified :
- [place of publication not identified] :
- Place of publication not identified :
- [Place of publication unknown] :
- [Place of publication not given] :
- Unknown place of publication :
- [place of publication not indicated] :
- [Place of publication not known] :
- Unknow place of publication :
- No place of publication :
- Place of publication unknown :
All of which (and more) already occur, and more still as they continue to pour in. Now, as someone pointed out when I posted the above to the BIBFRAME list, this element did not immediately have a standardized string specified for this information. But that just makes me wonder why it hadn’t been specified to begin with, and I’m also fairly certain that many of the above errors and others are coming in well after the standardized string was defined. We will know more as I continue to report on this throughout the year at “MARC Usage in WorldCat”.
So I guess my point is this: we all need to own this problem and work against the forces of inconsistency outlined above and others that may occur to you. These will include a wide variety of techniques that must encompass the entire library metadata ecosystem — from the individual cataloger to the massive aggregators like my employer.
Photo courtesy Quinn Dombrowski, Creative Commons Attribution Share-Alike 2,0 Generic.