Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Google, the Naked Emperor

08/15/2005
   Google rules. Wherever you turn you hear about a new Google initiative.
   Clearly, Google has the money to do some interesting things. But with
   all the hype and hullabaloo, it can be all too easy to overlook some
   serious flaws in Google's services.

   As librarians, we should not be giving Google a "pass" that we would
   not afford other vendors. By being clear about Google's strengths and
   weaknesses, we can make effective decisions about when and how to use
   Google's services and advise our users appropriately.
   Google Search

   Google's flagship service is, of course, its web index. Google became
   nearly everyone's favorite search engine by crawling more of the web
   than anyone else and making it searchable through a dead-easy interface
   that responded with amazing alacrity. But it should be acknowledged
   that it is really only good at some very specific things and is
   completely ineffective for other purposes.

   For example, sometimes I want to find brand-new web pages. But based on
   the PageRank algorithm (see Link List), these pages would naturally
   fall to the bottom of the search results. Does Google provide any
   method to reverse-sort the results, to view results based on date
   added, or to sort results based on the last change date of the page
   itself? No. So what are we left with? Trying to get to the "end" of the
   search results, wherever that may be.

   The problem is that you can't even get to the end. As a Google
   spokesperson put it, "Google provides only the 1000 most relevant
   search results for a query, even when there are more than 1000 matches.
   (Due to variations in our estimates of results, we may occasionally
   display slightly fewer than 1000)." There is no option to go beyond
   that wall.
   Google Scholar

   The Scholar search service was announced at the end of last year to
   wide acclaim. What it attempts to do is to crawl (using the standard
   Google infrastructure) and index content from academic and scholarly
   publishers. Although Google has agreements with many publishers, it has
   no agreement with some significant ones, including Elsevier. Scholar's
   crawl of content, however, can lag months behind its appearance on the
   original site.

   When users receive results, if the content is free, they can click
   through to it, but if it is not, they are taken to the publisher's web
   site, where they can often purchase access. Also, Google should be
   congratulated for working closely with libraries to enable OpenURL
   linking so that our clientele can click through to content under our
   licenses when they can be identified as valid users.

   Scholar ranks the results based at least partly on the number of times
   an article was cited by another source. Given the lack of options on
   changing this display, for some disciplines this can be disastrous. For
   example, most scientific researchers are more interested in timely
   access to the latest content, and Scholar fails on both counts.

   If you are in the humanities, Scholar doesn't fare much better. In a
   search on "hamlet," the results are swamped with scientific papers
   written by various persons named "Hamlet." Limiting the search word to
   the title of articles is better, but not much. What you get is a
   jumbled mess of scientific articles (e.g., HAMLET as an acronym for a
   substance or procedure), books, journal articles, and cryptic
   "citations" parsed from full-text articles.

   Search results that are marked as "[CITATION]" have been extracted from
   the full text of crawled sources and therefore are often very
   incomplete. Many individual results are, in fact, almost
   indecipherable. To find out more, the user must either click the
   supplied link to do a "Web Search," which usually fails to find the
   article online, or click on the "Cited by" link to go to the source
   that cited it to find enough information to locate the article.

   Scholar is, of course, in its early days, and it is quite possible that
   these problems will be addressed. But when considering whether Scholar
   is a sufficient replacement for commercial indexing services, we should
   use the very same criteria for evaluation, such as the "Database
   Quality Criteria" from SCOUG. At the moment, such a comparison leaves
   Scholar wanting in some very significant ways.
   Keeping our heads

   Collaboration with Google will likely provide some clear wins but also
   some significant trade-offs and even dire pitfalls. "It's important to
   remember," says Gary Price of ResourceShelf.com, "that Google is not in
   the information business in the same way as companies such as Factiva
   or Dialog are." Our clientele deserve no less than the same clear-eyed
   appraisal that we would use with any library vendor. It should not
   require an innocent child to detect when the emperor is without
   clothes.
     __________________________________________________________________

                                                    Link List
   Database Quality Criteria
   [145]bubl.ac.uk/archive/lis/org/ciqm/databa1.htm Google Scholar
   [146]scholar.google.com
   PageRank
   [147]en.wikipedia.org/wiki/PageRank Google Search
   [148]google.com