Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Interoperability: The Holy Grail

07/01/1998
   As fast as digital libraries are being built, they still remain islands
   of order in a sea of chaos. Locating them by using web search engines
   or subject directories is just the first step in a long process. Users
   must then go to each one, searching or browsing it before moving on to
   the next. This laborious method for locating digital library objects
   (from full-length books to individual photographs) is obviously
   anachronistic.

   What digital librarians envision instead is an infrastructure that
   supports simultaneous searching of multiple and geographically distant
   collections. Anyone who discovers any individual digital library should
   be able to search easily (and perhaps transparently) across a wide
   variety of collections from other libraries worldwide. That, at least,
   is the vision.

   So what will make this vision a reality? There are different models for
   achieving this level of interoperability between libraries in support
   of resource discovery and retrieval. In the end, it probably isn't so
   important what model we use as long as we get the result we seek.

   Whereas I recently focused on metadata standards and draft standards
   for cataloging digital objects ("21st-Century Cataloging"), here I look
   at how to provide cross-collection searching of these records. Because
   digital libraries are still largely in the experimentation and research
   stages, diversity is more prevalent than standardization.

   The union catalog model
   One way to achieve seamless access to a variety of physically distant
   collections is to contribute bibliographic records or access aids to a
   central database. Librarians are, of course, experienced at this,
   having built OCLC, the largest union catalog of bibliographic records
   in the world. For digital library objects, however, we do not have an
   equivalent union catalog.

   Probably the best example of this model is presented by the
   [123]Library of Congress (LC). As part of its [124]National Digital
   Library Competition (jointly sponsored with Ameritech), LC has proposed
   serving as a central repository for "coherent access aids" (e.g., MARC
   records, [125]Dublin Core records, or archival finding aids encoded in
   SGML), while the actual digital objects themselves would remain at
   their individual host institutions. Caroline Arms's 1997 paper
   [126]"Access Aids and Interoperability" describes this model.

   Another example is the [127]University of New Brunswick Library's
   metadata project. Records for digital objects hosted at several
   institutions were automatically "crawled" or gathered by a software
   program on a regular basis. They were then processed into a common
   format (in SGML) and indexed using Open Text software. This project
   proved the viability of the concept of a union catalog built by
   gathering records from distributed collections on a regular basis and
   indexing them centrally. Once the appropriate routines or programs are
   in place, records can be regularly produced without human intervention.

   A quite different way to approach interoperability is to establish
   standards to which all digital libraries would adhere and then provide
   an interface to search all the collections simultaneously. This exists
   to some degree now, as the [128]Networked Computer Science Technical
   Reports Library (NCSTRL).

   NCSTRL provides one-stop shopping for CS tech reports from hundreds of
   institutions around the world by requiring that each site install the
   same software package (Dienst) and create bibliographic records using
   the same format ([129]RFC 1807). At any of the NCSTRL sites, the search
   is sent simultaneously to all other sites; then those sites search
   their local index and return their results, the results are received
   and collated by the initiating site, and they are displayed to the
   user. When a particular record or report is requested, the remote
   server that has the report responds to the request.

   That, at least, is the model. However, due to poor response times, the
   bibliographic records are gathered from NCSTRL sites and indexed
   centrally at two or three index servers. In a sense, this model has
   retreated to that of a union catalog.

   The "intelligent agent" model
   Yet another method may be to create an [140]"intelligent agent" (a
   special kind of software program) that can roam the network searching
   digital libraries for objects of interest. The agent would report back
   periodically with any results.

   Requirements for success include, at minimum, that the agent know where
   to find digital libraries, have the capacity to query these libraries
   appropriately, and possess methods to process search results into a
   common format for merging and browsing. One benefit: as long as the
   agent knows how to perform queries, the underlying architecture of each
   digital library can be different.

   Intelligent agents are unlikely to do well with an uncategorized,
   all-inclusive database like the web. But digital library catalogs are,
   if anything, the exact opposite. They are organized collections of
   selected objects of a similar nature. They usually support highly
   specific queries and will frequently return useful results. These
   factors make intelligent agents a real possibility for providing an
   appearance of interoperability when none may exist by design. However
   intriguing, I don't know of a working example of such an agent.

   Whither interoperability?
   Of these models, only the union catalog model is fully functional with
   present technology. Although the distributed searching model is
   interesting, slow server and network response time makes it presently
   impractical. The lack of prototype systems makes it difficult to assess
   the intelligent agent model.

   Differing levels of bibliographic description create a barrier to
   interoperability with all of these models. Some items are described
   only at the collection level (in the case of archival finding aids),
   while others are described at the item level (MARC and Dublin Core
   records). Thus, a user may be required to search different systems or
   else navigate results that mix individual items with collection
   descriptions. This watershed divide in how digital objects are
   described probably presents the biggest barrier to seamless
   interoperability.

   We seem at least on the right path. Most digital library projects
   describe their objects using some type of standard or developing
   standard, thus making it possible to migrate their records to whichever
   becomes the clear winner. A number of cooperative projects are
   underway, in which libraries work together to provide easy access to
   their combined collections. And organizations like the Digital Library
   Federation and LC work toward the goal of interoperability. So,
   although we are still in the early stages of achieving the kind of
   vision that many digital librarians have of easy access to digital
   collections around the world, we are close enough to have gained some
   experience along the way.


                                                                 LINK LIST
                                        "Access Aids and Interoperability"
                                         [130]http://memory.loc.gov/ammem/
                                                   award/docs/interop.html
                                                Digital Library Federation
                                       [131]http://lcweb.loc.gov/loc/ndlf/
                                                               Dublin Core
                                            [132]http://purl.org/metadata/
                                                               dublin_core
                                        Encoded Archival Description (EAD)
                                            [133]http://lcweb.loc.gov/ead/
                                                        Intelligent Agents
                                       [134]http://www.cs.umbc.edu/agents/
                                                       Library of Congress
                                                  [135]http://www.loc.gov/
                                      National Digital Library Competition
                                               [136]http://memory.loc.gov/
                                                              ammem/award/
                                                                    NCSTRL
                                               [137]http://www.ncstrl.org/
                                           Request for Comments (RFC) 1807
                               [138]ftp://ftp.isi.edu/in-notes/rfc1807.txt
                    University of New Brunswick Library's Metadata Project
                                      [139]http://www.lib.unb.ca/metadata/