Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Google, the Naked Emperor
08/15/2005
Google rules. Wherever you turn you hear about a new Google initiative. Clearly, Google has the money to do some interesting things. But with all the hype and hullabaloo, it can be all too easy to overlook some serious flaws in Google's services. As librarians, we should not be giving Google a "pass" that we would not afford other vendors. By being clear about Google's strengths and weaknesses, we can make effective decisions about when and how to use Google's services and advise our users appropriately. Google Search Google's flagship service is, of course, its web index. Google became nearly everyone's favorite search engine by crawling more of the web than anyone else and making it searchable through a dead-easy interface that responded with amazing alacrity. But it should be acknowledged that it is really only good at some very specific things and is completely ineffective for other purposes. For example, sometimes I want to find brand-new web pages. But based on the PageRank algorithm (see Link List), these pages would naturally fall to the bottom of the search results. Does Google provide any method to reverse-sort the results, to view results based on date added, or to sort results based on the last change date of the page itself? No. So what are we left with? Trying to get to the "end" of the search results, wherever that may be. The problem is that you can't even get to the end. As a Google spokesperson put it, "Google provides only the 1000 most relevant search results for a query, even when there are more than 1000 matches. (Due to variations in our estimates of results, we may occasionally display slightly fewer than 1000)." There is no option to go beyond that wall. Google Scholar The Scholar search service was announced at the end of last year to wide acclaim. What it attempts to do is to crawl (using the standard Google infrastructure) and index content from academic and scholarly publishers. Although Google has agreements with many publishers, it has no agreement with some significant ones, including Elsevier. Scholar's crawl of content, however, can lag months behind its appearance on the original site. When users receive results, if the content is free, they can click through to it, but if it is not, they are taken to the publisher's web site, where they can often purchase access. Also, Google should be congratulated for working closely with libraries to enable OpenURL linking so that our clientele can click through to content under our licenses when they can be identified as valid users. Scholar ranks the results based at least partly on the number of times an article was cited by another source. Given the lack of options on changing this display, for some disciplines this can be disastrous. For example, most scientific researchers are more interested in timely access to the latest content, and Scholar fails on both counts. If you are in the humanities, Scholar doesn't fare much better. In a search on "hamlet," the results are swamped with scientific papers written by various persons named "Hamlet." Limiting the search word to the title of articles is better, but not much. What you get is a jumbled mess of scientific articles (e.g., HAMLET as an acronym for a substance or procedure), books, journal articles, and cryptic "citations" parsed from full-text articles. Search results that are marked as "[CITATION]" have been extracted from the full text of crawled sources and therefore are often very incomplete. Many individual results are, in fact, almost indecipherable. To find out more, the user must either click the supplied link to do a "Web Search," which usually fails to find the article online, or click on the "Cited by" link to go to the source that cited it to find enough information to locate the article. Scholar is, of course, in its early days, and it is quite possible that these problems will be addressed. But when considering whether Scholar is a sufficient replacement for commercial indexing services, we should use the very same criteria for evaluation, such as the "Database Quality Criteria" from SCOUG. At the moment, such a comparison leaves Scholar wanting in some very significant ways. Keeping our heads Collaboration with Google will likely provide some clear wins but also some significant trade-offs and even dire pitfalls. "It's important to remember," says Gary Price of ResourceShelf.com, "that Google is not in the information business in the same way as companies such as Factiva or Dialog are." Our clientele deserve no less than the same clear-eyed appraisal that we would use with any library vendor. It should not require an innocent child to detect when the emperor is without clothes. __________________________________________________________________ Link List Database Quality Criteria [145]bubl.ac.uk/archive/lis/org/ciqm/databa1.htm Google Scholar [146]scholar.google.com PageRank [147]en.wikipedia.org/wiki/PageRank Google Search [148]google.com