Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Google Out of Print
02/15/2005
Since Google announced its initiative to digitize all, or major portions, of the book collections in select research libraries, I've struggled to figure out what to think of it (see "[145]Google Is Adding Major Libraries to Its Database."). This is so difficult, in part, because we have so little information. The news articles were long on hype and short on specifics. Library commentators (see "[146]Google Plan Mostly Hailed by Librarians") could respond with precious few facts at hand. Even now, a month after the initial announcement, the very libraries involved appear to be mostly in the dark. Industrial strength? A recent blog entry by Elizabeth Edwards, a Stanford University Libraries staff member, is particularly enlightening (see "[147]The Google Deal"). According to Edwards, who was briefed on the Stanford-Google plan along with other staffers at a January meeting, "the company has not yet been forthcoming as to how the process of digitization will be implemented in detail; however, Google's process is characterized as 'industrial-strength digitization.'" Characterized by whom and with what evidence is unknown or unstated. Edwards further states that "Google is being 'coy' about standards and specs; minimums have been given but little to no fixed specs." It is difficult to judge the potential effectiveness of a project that provides no details. Information posted by the Stanford University Library says even less (see "[148]Stanford University and the Google Mass Digitization Project"). The public's domain For argument's sake, let's assume that Google knows what it's doing regarding digitization (a potentially disastrous assumption, I admit). Projecting only success, we soon see that instead of solving all our problems, our problems have only just begun. Evidence to the contrary, we must assume that Google does not wish to get sued out of existence for violating U.S. copyright law. Therefore, it will be able to display only tiny snippets of books under copyright. According to Edwards, "Google will be responsible for determining what's in copyright and what's not if there are any questionable materials, and copyright will drive what will be fully displayed." The problem is that determining what is in the public domain can be difficult. Anything published in 1923 or after could still be protected under copyright (see "[149]Copyright Term and the Public Domain in the U.S."), but discovering if any given work falls into this category is likely to be time-consuming and expensive (see "[150]How To Investigate the Copyright Status of a Work"). Without expensive research, the only works that can be displayed in full are materials published before 1923. A distorted landscape The only thing we know for sure is that the public will have access to all pre-1923 imprints digitized and an unknown number of post-1923 books. Unfortunately, I can think of few situations where having access to only pre-1923 literature is a good thing. In fact, for many situations it would be disastrous. As has been noted here (see [151]LJ 12/01, p. 39) and elsewhere, people use what is convenient. The typical user who finds a pre-1923 source available for free via Google is unlikely to sashay down to the local library for something more recent. That's just life. Google hype to the contrary, blind, wholesale digitization is no more a good thing than buying books based on color. Large research libraries that never weed their collections as a matter of policy end up with lots of outdated, useless material. Join this with blind, wholesale digitization, and it's clear we will soon find ourselves in a world where incorrect, dated information trumps current, accurate information through circumstance. Plan, then scan What is to be done? Those libraries that are participating in this project (Stanford, New York Public, Harvard, Oxford, and Michigan) should choose items judiciously. The potential impact of surfacing large portions of pre-1923 materials in their entirety, while leaving newer materials behind the copyright wall, should be carefully considered. What does Google want out of all this? Will it be satisfied with context-sensitive ad placements next to displayed books, with ads fFor antidepression medication shown next to Hamlet's soliloquy? I wonder, too, how Google plans to compensate its many shareholders impatiently waiting for a killing on their investment. Like many things about this project, Google isn't saying. Link List Copyright Term and the Public Domain in the U.S. [152]www.copyright.cornell.edu/training /Hirtle_Public_Domain.htm The Google Deal (Down on the Farm) [153]edwards.orcas.net/~misseli/blog/ archives/cat_digital_issues.html Google is Adding Major Libraries to Its Database [154]www.nytimes.com/2004/12/14/ technology/14cnd-goog.html Google Plan Mostly Hailed by Librarians [155]Library Journal, 12/20/2004 Google to Digitize 15 Million Books [156]Library Journal, 1/15/2005 How to Investigate the Copyright Status of a Work [157]www.copyright.gov/circs/circ22.pdf Stanford University and the Google Mass Digitization Project [158]www-sul.stanford.edu/about_sulair/ special_projects/google_sulair_project.html