Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Different Paths to Interoperability
02/15/2001
In a previous column, I discussed the importance of interoperability among digital library projects ('[123]Interoperability: The Holy Grail ,' LJ 7/98). Users should be able to discover through one search what digital objects are freely available from a variety of collections, rather than having to search each collection individually. In a more recent column, I highlighted a project that is achieving interoperability among preprint (or, as they are now commonly referred to, e-print) servers ([131]'Open Archives: A Key Convergence ,' LJ 2/15/00). For digitized library materials, there are at least two good examples of projects that are achieving the same goal through similar but intriguingly different means. The LC model For three years (1996-99) the Library of Congress (LC) and Ameritech teamed up to offer digitization grants (of up to $75,000 each) to libraries in the United States. LC required successful grantees to provide suitable access aids for the items digitized with award money. These access aids could be in one or more formats: 1) U.S. MARC records, 2) Dublin Core records (following LC guidelines for usage), 3) structured headers (encoded in Text Encoding Initiative format) for searchable text reproductions, and/or 4) Encoded Archival Description finding aids. Awardees were required to supply LC with the records for the items digitized. These records were added to the LC American Memory collection, thereby providing one place to search the digital collections of LC as well as those of all libraries receiving LC/Ameritech awards. The digitized items themselves remain at the individual institutions, as do copies of the item records. This highly centralized model for creating a union catalog was possible because LC and Ameritech controlled the funding and could establish record creation guidelines before digitization occurred, therefore providing for a high level of interoperability with records among different institutions. This model also requires a high level of commitment and precoordination among participating institutions and a willingness from all participants to follow set guidelines. These collections have been incorporated so seamlessly into the existing American Memory collections that users can easily be unaware that they are searching non-LC collections. Taking this work a step further, LC is developing a 'core set of metadata elements to be used in the development, testing, and implementation of multiple repositories.' This work should be particularly helpful for digital library projects that are looking to contribute records to a union catalog -- either now or in the future. The Picture Australia model In contrast to the LC model, the Picture Australia project came about after a good deal of library content -- nearly 500,000 items -- had been digitized and cataloged. Picture Australia aims to bring together access to digitized images relating to Australia from several institutions (currently seven, including libraries, the National Archives, and the Australian War Memorial). The particular challenges dictated a more flexible solution than that chosen by LC. Since records had already been created for digitized materials, Picture Australia needed a method to collect the records, massage them into a common record format, index them, and make them available for web searching. Rather than requiring participating institutions to ship data periodically to a central location (the National Library of Australia serves as the lead institution), project developers decided to collect the records monthly by using a software spider. This allows institutions simply to put their records in a specific location on their servers, to be collected automatically. The collected records must then be translated into a common record format (fields are based on the Dublin Core and the storage format is XML) and indexed (using Blue Angel's Metastar Enterprise). Most of the issues remaining for Picture Australia relate to this translation of heterogeneous metadata into a common set of elements. One problem is the loss of context. As Debbie Campell, the Picture Australia project manager, puts it, 'A collection of images may have a collective title such as "Images of Paul Revere." But the image title may be reduced to "On a horse." So the loss of context becomes a discovery issue.' Mounting challenges There is also the problem of differing subject vocabularies, particularly between libraries and museums. The use of geographic names without qualification (such as the name of the state in which it is found) can be problematic as well for those not familiar with Australian geography. The cataloging problems can go deeper, depending on how the participating institutions have cataloged their materials. A key issue is granularity. Whereas one institution may keep track of first and last names, for example, another may not. Differing formats can be another issue. One library may keep track of dates as MM/DD/YY, while another spells out the month and year. These are issues that must be rectified when translating contributed records into a common format. To see examples that illustrate some of these record variations, see the Picture Australia Metadata Guidelines. Despite these challenges, Picture Australia is clearly successful in its effort to bring together access to a wide range of pictorial material in one, easy-to-use location. This success rests on several factors. According to project manager Campbell, one factor was a forgiving timeframe. Although each project task was estimated and delivered according to a schedule, there was no overall deadline for release. This allowed some flexibility in reacting to unforeseen problems. Another factor was the low threshold for participation. Institutions contributing records were required to do very little to make their records available to Picture Australia. 'Picture Australia is quickly able to repurpose the investment already made in digitization and description,' Campbell said. The Picture Australia model has another advantage. It has its own brand identity, independent of any single institution. This encourages contributors to participate more equally than is possible when assigning records to a single institution, as with the LC model. Pick a model, any model Union catalogs are a good thing. They make accessible from one location what was formerly only accessible by visiting multiple locations and often by learning different search interfaces. Our users need more union catalogs. There is no 'best' model. You use what is appropriate. If you are beginning a project that provides you with the opportunity to lay out guidelines ahead of time, by all means do so -- it will save time and trouble later. But many great chances for creating union catalogs will come after records have been created. The best thing about Picture Australia is that the project has proved that not only can union catalogs be created after the fact, but that they can be done well. LINK LIST American Memory [124]http://memory.loc.gov Blue Angel Technologies [125]http://www.blueangeltech.com Dublin Core [126]http://purl.org/dc LC/Ameritech Collections Online [127]http://memory.loc.gov/ ammem/award/online.html LC Core Metadata Elements [128]http://lcweb.loc.gov/standards/ metadata.html Picture Australia [129]http://www.pictureaustralia.org Picture Australia Metadata Guidelines [130]http://www.pictureaustralia.org/ metadata.html