As someone who works with large masses of MARC data on a regular basis (at my day job at OCLC Research), you’d better believe I see a lot of inconsistencies in MARC data. This happens for a variety of reasons. One huge issue is that quite a few MARC elements are free-text fields. This means you can put just about anything you want in them, and people do.
But there are also fields that do specify what to enter, and still inconsistencies creep in. There are many reasons for this as well, here is a beginning list:
- Rules that are inexact or difficult to understand.
- An unclear understanding, or an imperfect use (whether deliberate or inadvertent), of those rules.
- Typographical errors.
- Data acceptance systems (either single record or batch) that fail to validate appropriate elements.
- Violation of rules for local purposes (for example, putting data in a different element so it will display in a particular system; or adding HTML markup to elements for local display purposes).
- Arguments over interpretation of rules. For example, when a record describes an item, or a version of an item. What’s a “version”?
- etc.
I’d like to assert that these problems are in our past, but I clearly cannot. Let’s take the 264 field for example[1]. Recently created, these fields are now pouring into WorldCat (in Jan. we found 56,706 such fields and in April we found 158,019 — nearly three times as many). Meanwhile, the rules seem fairly specific about what one should do if the place of publication is not apparent[2]: put “[Place of publication not identified] :” in the $a. Not any of these:
- [Place of publication not identified :
- [place of publication not identified] :
- Place of publication not identified :
- [Place of publication unknown] :
- [Place of publication not given] :
- Unknown place of publication :
- [place of publication not indicated] :
- [Place of publication not known] :
- Unknow place of publication :
- No place of publication :
- Place of publication unknown :
All of which (and more) already occur[3], and more still as they continue to pour in. Now, as someone pointed out when I posted the above to the BIBFRAME list, this element did not immediately have a standardized string specified for this information. But that just makes me wonder why it hadn’t been specified to begin with, and I’m also fairly certain that many of the above errors and others are coming in well after the standardized string was defined. We will know more as I continue to report on this throughout the year at “MARC Usage in WorldCat”.
So I guess my point is this: we all need to own this problem and work against the forces of inconsistency outlined above and others that may occur to you. These will include a wide variety of techniques that must encompass the entire library metadata ecosystem — from the individual cataloger to the massive aggregators like my employer.
Photo courtesy Quinn Dombrowski, Creative Commons Attribution Share-Alike 2,0 Generic.
I’ll admit to being guilty of #5: “Violation of rules for local purposes (for example, putting data in a different element so it will display in a particular system)”. But I was very young at the time. :-)
Bernie: Put your hand out — no, like this. *SLAP!*
Like I said, I was very young (and ignorant) at the time. *SLAP!* accepted penitentially!
Bernie: Although it was hard to tell from the comment, it was just a friendly tap!
OK…even though my transgression might have muddied millions of MARC records? ;-)
Bernie: AS IF. ;-)
as long as MARC (or any other) cataloging rules are rather suggestions than constraints, one will always get such errors. Either every record is checked and rejected (!) or one cannot expect valid data.
It would seem sensible (and efficient) for fields like this that have clearly defined defaults, that the *software* supply it as an option to be filled in (e.g., with the use of a pull down menu, right click menu, etc.). This would not only save time entering the data, but also cut down on input errors.
Roy, If you need more examples let me know. As I continue to work with this data I am continually amazed at the creativity expressed in this field (Both the 260 and catalogers). What was wrong with [s.n.], [s.d.], or [s.l.]? Besides being what I was taught as a cataloger in the dark ages of the ‘upgrade’ to AACR2.
Hi Roy, I was just at an OCLC presentation at the CO Association of Libraries, and several people attending were wondering what the current state of BIBRAME is – do you have an update?
Dodie, please see http://bibframe.org/ and also the BIBFRAME mailing list for the latest info: http://www.lsoft.com/scripts/wl.exe?SL1=BIBFRAME&H=LISTSERV.LOC.GOV
You mention “… we all need to own this problem and work against the forces of inconsistency …” but it seems to me that the current trends in cataloging don’t care much about consistency. Titles can now be in ALL CAPS if someone wants; additional authors can be traced or not; you can add a subtitle or not. Adding the relator codes are optional. Often, thousands of records are made into a semi-MARC format and dumped into the catalog with almost no controls at all. So, it is tough to say that we need consistency in something like [Place of publication not identified] when users probably won’t even notice it or care much about it if they do.
I am a #1 advocate in favor of consistency, but it seems that information such as [Place of publication not identified] is a perfect candidate for allowing a measure of inconsistency, especially when compared to the other points I mentioned above.
If we want to turn place of publication into an area for searching or limiting, that may be another factor, but there is such wide variation in the spelling of place names in the place of publication area that getting half-way decent results for place of publication would make the inconsistency found in [Place of publication not identified] look like child’s play.