roytennant.com :: Digital Libraries Columns

 

Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.

roytennant.com :: Digital Libraries Columns

The Open Content Alliance


12/15/2005

   About a year ago, Google announced a project to digitize large numbers
   of books from five research libraries. Dubbed "the Google Five," the
   University of Michigan, Harvard, Stanford, Oxford, and the New York
   Public Library signed an agreement with Google to provide portions (or,
   in the case of Michigan, all) of their collections to Google to be
   digitized. A year later we still don't know much more about their
   procedures, but now Google is being sued for digitizing material under
   copyright while out-of-copyright books are beginning to appear on the
   Google Print web site.

   By contrast, a similar initiative was recently announced about which we
   already know much more. Maybe that's why it's called the Open Content
   Alliance (OCA), put forward by the Internet Archive, Yahoo!, and a
   number of large libraries, including my employer, the California
   Digital Library. Microsoft shortly thereafter announced support as
   well, and additional libraries likely will join. Yahoo!, Microsoft, and
   the libraries themselves are paying the Internet Archive to digitize
   materials at 10 cents a page--an excellent price for nondestructive
   scanning. The resulting files will be made available at the Internet
   Archive web site and likely at other locations.
   Open and accessible

   Since the OCA is focusing on out-of-copyright material, it is dodging
   the legal fight that Google is taking head-on. This means that all OCA
   content will be viewable in its entirety online. But the project goes
   further. The digitized files and their associated metadata will be
   available for complete downloading, thereby allowing anyone to create
   singular presentations of this material. Some books are already
   available for downloading and printing.

   The importance of this becomes clearer by visiting the Open Library
   site, where the Internet Archive has mounted a few dozen of the books
   already digitized. The method closely resembles paging through a
   physical book. Although this presentation may seem compelling, some
   potential drawbacks soon become apparent. It's difficult to jump to a
   particular chapter, for example, and other features such as searching
   and the all-important ability to magnify the page don't work yet.

   Still, if you do not like this orientation, you can create your own.
   Clicking on "Details" while viewing an Open Library book pulls up a
   small window giving some core metadata about the title and a link to
   the Internet Archive site that allows anyone to download a PDF or DjVu
   format of the book, or even the entire package of digital files from
   which these presentations were created. These books, in other words,
   are as open and accessible as possible.

   Beyond the books themselves, the process itself is open. Only days
   after the initiative was announced, the University of California
   partnership agreement with the Internet Archive was made available to
   the library press. By contrast, months after the Google Print
   initiative was made known, the University of Michigan, after some
   pressure, released its agreement with Google. No other library of the
   Google Five has so far released its agreement.
   Principles and collaboration

   The OCA effort, unlike that of Google, is based on respect for
   collections and the principles behind mass digitization of library
   materials. Research libraries, writes Dan Greenstein of the California
   Digital Library in a draft principles document, must "clearly and
   unambiguously begin articulating what public goods are served by
   massive digitization of their holdings," plus "articulate and agree to
   adhere to a set of principles" to ensure that the resulting products
   "support and promote these public goods."

   It's unclear whether the OCA project will rival the Google Library
   project in size. Since it is easier for organizations to participate,
   the OCA will easily have more participants, but the Google project may
   lead in the number of digitized volumes if it fulfills its promise.
   Only time will tell. In any case, more digitized content is likely a
   better thing overall.

   The agreement between the University of California and the Internet
   Archive emphasizes that the initiative is collaborative, as both
   parties must agree to a protocol that will set up procedures for, among
   other things, moving the books to and from the Internet Archive
   digitization shop, identifying and attaching appropriate metadata to
   the scanned files, and assessing the scanned files against appropriate
   standards.

   Collaborations among participating libraries are also likely, if for no
   other reason than to minimize duplication. There are other
   opportunities for collaboration and not just among OCA libraries but
   with the "Google Five" and many other institutions involved with
   digitizing content. Open digitized content, after all, is a growing
   boon to all of our libraries and the users we serve.

   For more on the wired library, see the [145]netConnect supplement
   mailed with the January, April 15, July, and October 15 issues of LJ.
     __________________________________________________________________

                                    Link List
   California Digital Library
   [146]www.cdlib.org Google Print
   [147]print.google.com Internet Archive
   [148]www.archive.org
   Open Content Alliance
   [149]www.opencontentalliance.org Open Library
   [150]www.openlibrary.org The Open Library Vision
   [151]www.openlibrary.org/details/openlibrary