Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Building a New Bibliographic Infrastructure
01/15/2004
More than a year ago I called for the death of MARC (see [123]LJ 10/15/02 , p. 26ff.). That column sparked a lively discussion among librarians--especially catalogers. As I thought about it and discussed the issue with others, I decided I had convicted the wrong suspect. Let MARC die of old age rather than homicide. I thought that MARC (the MARC record syntax, MARC elements, and AACR2) was too limiting for modern library needs and opportunities. I now realize that with a robust bibliographic infrastructure we could profitably use any bibliographic metadata standard that we could imagine, including MARC. The point is we need to craft standards, software tools, and systems that can accept, manipulate, store, output, search, and display metadata from a wide variety of bibliographic or related standards. Our systems should be able to accept an ONIX record from a publisher, which contains basic bibliographic fields and elements like cover art, and use it as a prototype cataloging record or to enrich an existing record. Our systems should be able to output a record easily in Dublin Core for harvesting through the Open Archives Initiative's Protocol for Metadata Harvesting. In other words, we need a new bibliographic infrastructure that allows for the easy and effective sharing of various types of records. This requires a transfer format or schema that can encapsulate everything you need to associate with a particular intellectual object. Luckily, such a schema is being developed: the Metadata Encoding and Transfer Syntax (METS). Some libraries are already using this schema to embed multiple bibliographic records from different schemata into one package. Supporting all records This is our future; we can no longer rely on only one record structure. We must be able to accept many different kinds of bibliographic record structures, from ONIX to Dublin Core to whatever else comes along that contains useful information. To use these record formats we will need rules and guidelines to follow in their application. We need both general rules and schema-specific rules, similar to the way we have used AACR2 to define what information we capture in MARC. Beyond rules and guidelines, best practices as developed by libraries working with this new infrastructure will help guide other libraries. "Crosswalks" or standard translations from one bibliographic schema to another will also be required, although in many cases it will be better to retain records in their original format and map elements into common fields upon indexing. No more homogenization An undertaking of this type has many challenges. One of the first and most significant will be moving from a relatively homogenous bibliographic environment (the MARC21/AACR2 hegemony) to a diverse one. We will need to find useful ways of ingesting records in a variety of formats, both by crosswalking and by indexing the original formats and virtually merging the diverse records. Moving to a new infrastructure would, at minimum, require upgrades to our existing integrated library systems. At worst, it could require migrating to entirely new systems. Neither of these solutions is without problems, as anyone who has switched systems will confirm. But migrating is unlikely to be as difficult as it will be to change us. Those of us who have only known a MARC world may find it difficult to learn how to build and use a diverse bibliographic environment effectively. Despite these challenges, such changes are both necessary and achievable. They are necessary to exploit new metadata opportunities and technologies like XML and the Internet. Our choice is to remake our bibliographic infrastructure and achieve new levels of service, or to maintain the status quo and risk becoming increasingly (and deservedly) marginalized. The challenge to systems, utilities Will our integrated library systems be up to the task? Although a number of vendors clearly see a future based on XML, it will be some time before their systems can easily accommodate records from a variety of formats. Our major bibliographic utilities, RLIN and WorldCat, are moving in the right direction. OCLC has remade WorldCat from the bottom up, employing XML and an in-house XML schema dubbed "XWC" that accommodates Dublin Core, MARC, and other formats. This is just the beginning of a rich bibliographic infrastructure that could employ metadata in just about any XML-encoded form. If you are intrigued by this vision for a bibliographic infrastructure, watch for my article "A Bibliographic Metadata Infrastructure for the 21st Century" in an upcoming issue of Library Hi Tech. Meanwhile, consider where you think our bibliographic systems should be and let me know your thoughts. __________________________________________________________________ LINK LIST METS [124]www.loc.gov/standards/mets MODS [125]www.loc.gov/standards/mods ONIX [126]www.editeur.org/onix.html