Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Google Out of Print

02/15/2005
   Since Google announced its initiative to digitize all, or major
   portions, of the book collections in select research libraries, I've
   struggled to figure out what to think of it (see "[145]Google Is Adding
   Major Libraries to Its Database."). This is so difficult, in part,
   because we have so little information.

   The news articles were long on hype and short on specifics. Library
   commentators (see "[146]Google Plan Mostly Hailed by Librarians") could
   respond with precious few facts at hand. Even now, a month after the
   initial announcement, the very libraries involved appear to be mostly
   in the dark.
   Industrial strength?

   A recent blog entry by Elizabeth Edwards, a Stanford University
   Libraries staff member, is particularly enlightening (see "[147]The
   Google Deal"). According to Edwards, who was briefed on the
   Stanford-Google plan along with other staffers at a January meeting,
   "the company has not yet been forthcoming as to how the process of
   digitization will be implemented in detail; however, Google's process
   is characterized as 'industrial-strength digitization.'" Characterized
   by whom and with what evidence is unknown or unstated.

   Edwards further states that "Google is being 'coy' about standards and
   specs; minimums have been given but little to no fixed specs." It is
   difficult to judge the potential effectiveness of a project that
   provides no details. Information posted by the Stanford University
   Library says even less (see "[148]Stanford University and the Google
   Mass Digitization Project").
   The public's domain

   For argument's sake, let's assume that Google knows what it's doing
   regarding digitization (a potentially disastrous assumption, I admit).
   Projecting only success, we soon see that instead of solving all our
   problems, our problems have only just begun.

   Evidence to the contrary, we must assume that Google does not wish to
   get sued out of existence for violating U.S. copyright law. Therefore,
   it will be able to display only tiny snippets of books under copyright.
   According to Edwards, "Google will be responsible for determining
   what's in copyright and what's not if there are any questionable
   materials, and copyright will drive what will be fully displayed."

   The problem is that determining what is in the public domain can be
   difficult. Anything published in 1923 or after could still be protected
   under copyright (see "[149]Copyright Term and the Public Domain in the
   U.S."), but discovering if any given work falls into this category is
   likely to be time-consuming and expensive (see "[150]How To Investigate
   the Copyright Status of a Work"). Without expensive research, the only
   works that can be displayed in full are materials published before
   1923.
   A distorted landscape

   The only thing we know for sure is that the public will have access to
   all pre-1923 imprints digitized and an unknown number of post-1923
   books. Unfortunately, I can think of few situations where having access
   to only pre-1923 literature is a good thing. In fact, for many
   situations it would be disastrous. As has been noted here (see [151]LJ
   12/01, p. 39) and elsewhere, people use what is convenient. The typical
   user who finds a pre-1923 source available for free via Google is
   unlikely to sashay down to the local library for something more recent.
   That's just life.

   Google hype to the contrary, blind, wholesale digitization is no more a
   good thing than buying books based on color. Large research libraries
   that never weed their collections as a matter of policy end up with
   lots of outdated, useless material. Join this with blind, wholesale
   digitization, and it's clear we will soon find ourselves in a world
   where incorrect, dated information trumps current, accurate information
   through circumstance.
   Plan, then scan

   What is to be done? Those libraries that are participating in this
   project (Stanford, New York Public, Harvard, Oxford, and Michigan)
   should choose items judiciously. The potential impact of surfacing
   large portions of pre-1923 materials in their entirety, while leaving
   newer materials behind the copyright wall, should be carefully
   considered.

   What does Google want out of all this? Will it be satisfied with
   context-sensitive ad placements next to displayed books, with ads fFor
   antidepression medication shown next to Hamlet's soliloquy? I wonder,
   too, how Google plans to compensate its many shareholders impatiently
   waiting for a killing on their investment. Like many things about this
   project, Google isn't saying.
                                   Link List
   Copyright Term and the Public Domain in the U.S.
   [152]www.copyright.cornell.edu/training
   /Hirtle_Public_Domain.htm The Google Deal (Down on the Farm)
   [153]edwards.orcas.net/~misseli/blog/
   archives/cat_digital_issues.html
   Google is Adding Major Libraries to Its Database
   [154]www.nytimes.com/2004/12/14/
   technology/14cnd-goog.html Google Plan Mostly Hailed by Librarians
   [155]Library Journal, 12/20/2004
   Google to Digitize 15 Million Books
   [156]Library Journal, 1/15/2005 How to Investigate the Copyright Status
   of a Work
   [157]www.copyright.gov/circs/circ22.pdf
   Stanford University and the Google Mass Digitization Project
   [158]www-sul.stanford.edu/about_sulair/
   special_projects/google_sulair_project.html