:: Digital Libraries Columns


Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date. :: Digital Libraries Columns

21st-Century Cataloging


   Cataloging has basically remained unchanged for decades. Despite the
   development of Machine-Readable Cataloging (MARC) and the
   Anglo-American Cataloging Rules, 2d Edition (AACR2), what is recorded
   about a library item is the same as it was when we used handwritten
   catalog cards. Today, many library catalogs simply duplicate the
   catalog card on a computer screen.

   Now, the game has changed. In the digital library, we no longer deal
   with the typical printed book or serial. We may need to describe a
   collection of digitized photographs, or a series of pages that must
   somehow be navigable as a logical whole like the printed book from
   which they are derived. And we must keep track of such things as how
   the digital representation was captured and manipulated. Librarians
   have historically called this "cataloging." We in digital library work
   call it "metadata."

   Metadata, simply put, is structured information about information. The
   key is "structured." In metadata as in cataloging, a free-text
   description usually won't suffice. Rather, in order to limit a search
   to a particular field, the information must be structured, often highly
   so. That is why MARC has tag and subfield markers, which allow software
   to understand exactly how to treat each descriptive element.

   Does MARC translate?

   So why don't digital library projects use MARC? Some do, such as when
   records for digital objects are merged with a library catalog and
   loaded into the catalog as with the record for a print book. The State
   Library of Victoria Multimedia Catalogue has over 120,000 records for
   digital objects.

   However, for many purposes, MARC is a poor fit. In some cases it is too
   complex, requiring highly trained staff and specialized input systems;
   in others, it is too focused on print material and can't be extended
   for digital collections. Digital librarians have identified three
   categories of metadata information about digital resources: descriptive
   (also called intellectual), structural, and administrative. Of these
   categories, MARC really only deals well with intellectual metadata.

   Descriptive metadata includes the creator of the resource, its title,
   appropriate subject headings -- basically the kinds of elements that
   will be used to search for and locate the item.

   Structural metadata describes how the item is structured. In a book,
   pages follow another. But as a digital object, if each page is scanned
   as an image, metadata must "bind" hundreds of separate computer files
   together into a logical whole and provide ways to navigate the digital

   Administrative metadata may include such things as how the digital file
   was produced and its ownership.

   All of this potential metadata needs containers. However, most of the
   metadata described above has no standard container waiting to receive
   it, as MARC receives the information specified by AACR2. There are,
   however, some emerging standards that may be to digital libraries what
   MARC was to print-based libraries.

   Dublin Core emerges

   The best general purpose metadata draft standard is the [135]Dublin
   Core. The Dublin Core represents a multiyear (and ongoing) effort by
   librarians, computer scientists, museum professionals, and others to
   devise a simple yet extensible standard that could be used to describe
   a wide variety of objects within a wide variety of subject disciplines
   and systems. The Dublin Core consists of 15 elements such as title,
   subject, and so on. The element names and basic purposes are fixed, but
   most details regarding them remain unresolved.

   Meanwhile, dozens of projects around the world are now using it.

   [136]The Nordic Metadata Project. A consortium of Nordic countries
   working to create a metadata production and use system has created a
   utility for translating Dublin Core records into MARC and vice versa.

   [137]DSTC Resource Discovery Unit. This Australian organization uses
   the Dublin Core in a variety of projects.

   [138]UK Office of Library Networking (UKOLN). UKOLN provides a wealth
   of software for metadata production and utilization, focusing on the
   Dublin Core.

   Outside the Core

   While the Dublin Core specifies certain elements to describe an item,
   it does not specify a transfer syntax or a MARC equivalent. For now, it
   appears that the emerging [139]Resource Description Framework (RDF),
   produced by the World Wide Web Consortium (W3C), will provide one of
   the best methods for encoding this information in a machine-parsable

   RDF is itself based on Extensible Markup Language (XML), which is an
   emerging standard that will likely have a great impact not only on
   resource description but on the web itself. XML provides users with a
   structured way in which to encode just about anything, from web pages
   to database entries.

   XML represents an advance over current HTML, which offers very little
   structural information embedded in a document. For now, searching on
   the web is scattershot. XML will allow users to search for words in
   section headings or in an author field, so we will be able to search
   web documents the way we now search library catalogs. It is likely that
   the upcoming 5.0 versions of both Netscape Navigator and Microsoft
   Internet Explorer will offer some level of native support for XML. This
   would allower users to add more powerful and flexible services to a web
   server while still providing other information in HTML.

   But while XML 1.0 is now stable, related standards, including RDF, are
   still being developed. There seems to be a groundswell of industry
   opinion, however, that XML is the future of the web. Keep your eye on
   the World Wide Web Consortium (W3C) and the site.

   While the Dublin Core is useful for describing individual objects,
   there is another draft standard that is useful for describing
   collections of objects, specifically archival materials. The Encoded
   Archival Description (EAD) is the emerging standard for creating
   machine-readable archival finding aids. EAD is an example of a Document
   Type Definition (DTD), which specifies how archival finding aids should
   be tagged using the Standard Generalized Markup Language (SGML).
   Although the standards effort began at UC-Berkeley, it is now managed
   by the Library of Congress. See examples at EAD Sites on the Web.

   These emerging standards all attempt to provide a highly structured way
   to describe various digital objects and make them easy to locate and
   use. That, after all, is what cataloging is all about.

                                                                 LINK LIST
                                              DSTC Resource Discovery Unit
                                                               Dublin Core
                                                      EAD Sites on the Web
                                        Encoded Archival Description (EAD)
                                                      Metadata Information
                                                   Nordic Metadata Project
                                      Resource Description Framework (RFD)
                                   UK Office of Library Networking (UKOLN)
                              State Library of Victoria Multimedia Catalog
                                                            XML at the W3C