roytennant.com :: Digital Libraries Columns

 

Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.

roytennant.com :: Digital Libraries Columns

The Benefits of Grid Networks--Digital Libraries


03/15/2005

   Some recent events have made me think about grid networks. By grid
   networks I mean both networks of computers and networks of humans
   connected together in a grid topology. One event was a posting by
   Lorcan Dempsey, OCLC director of research, to his blog, "WorldCat in
   Your Pocket." (See the [123]Link List) He describes a computer cluster
   (or "grid") that OCLC recently acquired to speed up processing of a
   test version of WorldCat, the 56 million-record bibliographic database
   that is OCLC's playground.
   Computer grids

   A computer grid comprises a set of interconnected "nodes" consisting of
   one or more CPUs (the brains of the computer) and very fast but
   volatile memory (RAM). Software then parcels out a computing task to
   this grid. Amazing heights of processing power can be reached by having
   all the nodes work on a problem simultaneously.

   A Beowulf cluster, which is the type of grid system OCLC has, uses
   off-the-shelf PCs with dual processors and expanded RAM. Since
   commodity PCs are relatively inexpensive, a cluster of this type is
   often much less costly than a mainframe computer while delivering
   similar or even faster computing power.

   Computer grids are not just for research scientists. Google relies on
   this kind of technology to deliver search results from billions of web
   pages in seconds. It uses grids with enough RAM to prevent reading data
   from a disk, a notoriously slow (relatively speaking) operation.

   On his blog, Dempsey wrote that using the Beowulf cluster for
   processing "means that what might have taken a minute now takes two
   seconds, what might have taken an hour takes two minutes, what might
   have taken a month takes a day. For jobs that will fit entirely in
   memory (e.g., a 'grep' of WorldCat), avoiding disk input/output gives
   another factor of about 20, reducing one-hour jobs down to six
   seconds." Grep is a UNIX string search that finds specific text
   anywhere in a record.

   Therefore, with inexpensive, off-the-shelf hardware components,
   libraries can do what was once difficult, overly expensive, or
   impossible. At the time that OCLC purchased its Beowulf cluster, it
   cost around $100,000-$120,000. Now the same thing would cost less.
   Storage is already ubiquitous and massive (see "[124]Bigger, Cheaper,
   Everywhere," LJ 10/15/04, p. 26). Through grid networking technologies,
   processing power is becoming that way as well.
   Social grids

   Meanwhile, I've been working with new forms of professional
   communication and discussing their impacts with colleagues. I call this
   phenomenon "social grid networking." By distributing a problem among a
   group of people, you're likely to get it solved faster and likely
   better. Just like computer grids.

   The channels of communication are many and varied. There are many blogs
   by librarians, and as with any publication, you can quickly discover if
   they are useful to you. Chat can be either one on one or group. Lately
   I've begun hanging out on the code4lib chat room, and it's remarkable
   how much I learn while also fostering stronger connections with
   colleagues.

   Link sharing is another form of communication. Seeing what others of
   similar interest bookmark can be a useful form of current awareness.
   The unalog link sharing community keeps me current with information and
   technologies useful to digital library developers.
   Grid benefits

   Podcasting is a new communication method, although it is a broadcasting
   technology of one to many rather than many to many. Podcasting is a
   recorded message in MP3 format, suitable for downloading to an iPod
   (thus the term) or other MP3 players. Podcasters typically record a
   broadcast on a regular basis, similar to a radio broadcast or newspaper
   column, and users can then download the MP3 file to their player and
   listen whenever it is convenient.

   What both computer and social grid networks offer librarians are
   faster, more effective methods either to solve problems or exploit our
   opportunities better. It means that our users are increasingly able to
   take advantage of whatever methods of communication they wish. It also
   means that libraries are being challenged to deliver information in
   whatever form(s) our users choose--whether new book lists via RSS or a
   podcast on how to research a topic. As the premier information
   profession, we should at least be familiar with all the various methods
   in which information can be communicated, if not be fully equipped to
   use whichever form is best for a given purpose or audience.
                        Link List
   Beowulf Clusters
   [125]www.beowulf.org unalog
   [126]unalog.com
   WorldCat in Your Pocket
   [127]orweblog.oclc.org/archives/000544.html