Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
The Consequences of Cataloging

01/15/2002
   As our usage statistics decline (on average), owing to the perceived
   promise of the Internet, we must think imaginatively. More and more, I
   believe that means establishing ever-wider cooperative relationships
   with other libraries.

   After all, there is no cheaper way to expand your collections than to
   make it possible (and easy) to request and receive materials from other
   institutions. The Westchester Library System, Ardsley, NY (see
   'Technology and Teamwork,' Link List) is a great example. The 38
   cooperating libraries made it easy for patrons to receive any book
   within the system by using the integrated interlibrary loan (ILL)
   software of their Dynix system and backing it up with a robust delivery
   service. In a challenge to conventional wisdom, older, long-ignored
   books began flying off the shelves.

   However, underlying this increased cooperation and its benefits are
   some niggling details that may prove to be significant stumbling
   blocks. Look no further than the icon and foundation of libraries--the
   catalog. Though it may be painstakingly constructed using respected
   standards such as MARC and AACR2, the catalog may be less standard and
   therefore less interoperable than we think.
   Too many records

   This became dramatically apparent as I prepared a talk for some ILL
   librarians. I decided to search the region's union catalog system for
   one of my books. The numerous records returned in part reflected
   multiple editions and printings. I decided to winnow down the results
   to only those records that seemed to describe the exact same book. I
   reduced the number to seven--seven independent records for the same
   book, in one medium-sized state.

   It appeared that two or three base records had been embellished or
   altered in various, mostly trivial ways. One misspelled the place of
   publication and added 'maps' to the physical description. Another
   quibbled with the copyright date (1993, but it was published at the end
   of 1992) and measured the book one centimeter smaller than the other
   records. One record said 'leaves' instead of 'p.' for the pagination
   notation.

   For subject headings, the records grouped around two main clusters. The
   differences seemed to revolve around plain mistakes of various kinds
   (misspellings mostly), added information, and disagreements. Except for
   the differences in subject headings, all the differences were
   completely and utterly inconsequential to the user.

   Since these variations were so trivial, why hadn't these records been
   merged? Because the system being searched is a 'virtual' union catalog.
   The records don't come from the same system but are merged on the fly
   after searching separate catalog systems.

   Karen Coyle, in 'The Virtual Union Catalog,' cautions that, with
   systems that retrieve large sets of results, merging records on the fly
   will be extremely difficult. With 'real' union catalogs, where records
   are contributed to one central database, there is more opportunity to
   merge duplicate records successfully, as well as to iron out trivial
   differences over time.

   There is no question that merging such records is vital to effective
   user services in a cooperative environment. It's not clear how we
   should handle records that vary, however slightly. For example, do most
   users care whether they get a hardback or a paperback? Some may, some
   may not. We must make it easy for them to select the correct title, and
   then the appropriate copy, without inundating them.
   How to merge records

   We consider it more important to know that we have a specific item in
   our collections rather than that several printings of a work hold the
   same content. Jeremy Hylton, in 'Identifying and Merging Related
   Bibliographic Records,' advocates Michael Buckland's idea of an
   'information dossier' approach to merging relating records. This goes
   beyond the standard library practice of duplicate record detection and
   merging ([123]see 'Record Matching: An Expert Algorithm') to merge
   records that describe different physical items but are algorithmically
   perceived to be the same intellectual object.

   Although Hylton's algorithm may be inadequate when faced with some of
   the ambiguous records that can be found in large library catalog
   systems, it nonetheless highlights an important issue: Don't users
   initially want to see that there are many different physical copies of
   a book, or do they want that initially hidden until they select a
   specific book to retrieve? How should our catalog systems mask
   information in displays where it isn't important but still make it
   displayable when users want to see it?

   As we move toward providing access to ever-larger pools of library
   content, these are questions we will need to answer, and answer well.
   It is clear that our cataloging practices can have unintended--and
   detrimental--consequences.
     __________________________________________________________________

LINK LIST

   Identifying and Merging Related Bibliographic Records
   [125]ltt-www.lcs.mit.edu/ltt-www/People/jeremy/thesis/main.html

   Record Matching: An Expert Algorithm
   ASIS Proceedings, 22 (1985), 77-80

   'Techology and Teamwork'
   Library Journal 9/1/00, p. 160-163

   The Virtual Union Catalog
   [126]www.dlib.org/dlib/march00/coyle/03coyle.html