Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Is Metasearching Dead?
07/15/2005
The best thing about Google Scholar, the beta Google service for searching scholarly information, is Anurag Acharya. Acharya, the architect of Google Scholar ([145]Scholar.google.com), is approachable, bright, and focused on building a usable interface for those seeking scholarly information. And, mostly, he has been successful. Scholar successes Google Scholar crawls scholarly, web-based content--predominantly by targeting specific publishers with which Google has contractual arrangements. Open access material is crawled as well. These full-text materials, abstracts, and citations are being indexed, and citations from these papers to other sources are also extracted and indexed. So besides returned links to full-text articles, search results can also include citations to nonscholarly materials and scholarly materials that Google has not crawled. Google Scholar claims to have agreements to crawl its full-text content with all major publishers except Elsevier and the American Chemical Society, but at least one other publisher is missing--the American Psychological Association. Search results are returned in rank order, using an algorithm that includes such criteria as where the search words appear (e.g., search words in an article title provide a higher rank than words in the body of the document) and how often an article has been cited. Searching Scholar, however, demonstrates that the ranking weight afforded to highly cited articles is in most cases the most compelling factor. The most highly cited articles consistently float to the top. From one perspective, this is useful since it tends to highlight the most historically important articles. A clear success of Google Scholar from the library perspective is that its staff have cooperated with libraries to implement a mechanism for library OpenURL links to appear in the search results. Libraries need to register for this service (see [146]scholar.google.com/scholar/libraries.html) and provide holdings information to Scholar (usually done by a configuration setting in link resolver software). But there are problems, which is why Acharya is so good. He is up to the challenges. Scholar challenges One challenge is to serve multiple purposes and audiences effectively with a service that is Google-simple and therefore not very flexible. As noted, highly cited articles often appear at the top of the results. But for users like scientific researchers familiar with their field, these are the articles they don't want to see. They want the newest (as yet uncited) research, which sifts lower in the results. The Scholar interface provides no way to sort results by publication date. The service is also plagued with timeliness issues. It can take months for articles that have appeared in PubMed to appear in Scholar. Scientific researchers will find such a lag time unacceptable. Acharya is aware of this issue, and the company is working on it. There are also searching anomalies that prevent articles from being found with standard techniques. If you search the entire title of an article as a phrase, the very length of the search string can cause the search to fail. Meanwhile, using selected words out of the title often returns the article far down in the results list, since full-text searching will often retrieve other articles more heavily cited. Library metasearching: RIP? Will Google Scholar replace the need for library-based metasearch services? Some of my colleagues believe so, but I don't, no matter how good Scholar gets (and it will get better). Unlike Acharya, who thinks ranking renders selection unimportant, I believe what you don't search can be as important as what you do. Search "Hamlet" on Google Scholar and you will be inundated with scientific articles by various Hamlets. Even limiting to words in the title (the most specific search one can do) results in many scientific articles interspersed among the literary. I believe in creating search interfaces crafted for a specific audience or purpose, and Scholar's one-stop shopping can be a less-than-compelling generic solution to some rather specific problems. Even if Google Scholar eventually gains access to a reasonably large collection of the scholarly record, librarians will still need to unify searching of two or more sources on behalf of their clientele. There will still be a need for metasearch services. In the end, Scholar is a tremendous advance for those who have little or no access to the licensed databases and content repositories that libraries provide. But for those who are served by large research libraries, it is very much an open question whether the generic Google Scholar can serve their needs better than services tailored specifically for them.