Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Institutional Repositories

09/15/2002
   Faculty and researchers at universities worldwide gather and interpret
   data, advocate new ideas, and extend human knowledge. This work is
   sometimes shared with other scholars and researchers as working papers,
   technical reports, and other forms of prepublication work. Although
   this scholarship may eventually show up in a peer-reviewed journal or
   book, some may not. This preprint culture is strongest in the
   scientific and technical disciplines, but social scientists share
   similar works. This 'grey literature' is often difficult to find and
   even more difficult for librarians to collect systematically, manage,
   and preserve (see '[123]What Is Grey Literature?' in the link list).
   But the web and other digital technologies are changing all that.

   A variety of web-based systems are becoming available for accepting
   deposits of papers. These systems make the research output of
   institutions easier to discover as well as manage and preserve. They
   also make it possible to share information globally through compliance
   with a standard metadata harvesting protocol.

   For an institution wishing to implement a repository, there are now
   implementation models to consider and software decisions to make.
   Although you will need to know more to set up a repository, here is a
   beginning road map. If you wish further information, your next move
   should be to read the recently released 'The Case for Institutional
   Repositories: A SPARC Position Paper.'

   Software
   Some systems are open source, while others are commercial. Foremost
   among the free variety of software is ePrints, an open-source project
   from the UK's University of Southampton. The ePrints solution is
   squarely focused on the faculty working paper (also called preprint or
   e-print). The ePrints model assumes that faculty will directly upload
   their own prepublication scholarship for open access via an
   institutional or subject-based repository. A number of institutions are
   now using this software, including CalTech and the Digital Library of
   the Commons at Indiana University.

   Another package slated to become open source is DSpace, developed
   through a partnership between the MIT Libraries and Hewlett-Packard.
   DSpace is designed to be a more flexible solution than ePrints. It
   makes fewer assumptions regarding what type of object is being
   uploaded. Since the programmer who developed ePrints is now a key
   developer with the DSpace project, DSpace has roots in ePrints but has
   no doubt surpassed it. MIT is the only user, but once the software is
   released in open source, other institutions may choose to implement it.

   The Berkeley Electronic Press (bepress) offers a commercial solution.
   Bepress provided a sophisticated solution for peer-reviewed journals
   when the University of California entered into a codevelopment
   agreement with the press to add key features for institutional
   repository support. Now the bepress software is both compliant with key
   standards and simpler to use for those who do not need the peer review
   capabilities.
   Implementation models

   The software platform is but one essential step to creating an
   institutional repository. Perhaps more important is identifying an
   appropriate implementation model. There are nearly as many models as
   there are institutional repositories, but focusing on a few examples
   may highlight some important differences.

   MIT uses a distributed model, championed by Southampton's Stevan Harnad
   and others as 'self-archiving,' whereby individual faculty upload and
   manage their own scholarly output. DSpace has the widest focus of any
   repository described here; it explicitly welcomes any scholarly object.
   'Educational material in digital formats (e.g., online lecture notes,
   visualizations, simulations, original graphics) are some of the most
   valuable assets produced by colleges and universities today and are
   extremely important to the faculty that create them,' says Mackenzie
   Smith, DSpace's project director. 'Much of this material is really like
   a new kind of publication and clearly needs to be captured, managed,
   and often preserved... what better place to take responsibility for
   this than the library?'

   The University of California's eScholarship uses a semidistributed
   model that assigns management responsibility to organizational units
   (research units, departments) that then assist faculty with uploading
   their papers. CalTech uses a semicentralized model, wherein repository
   sites can be set up for any university unit, but the library uploads
   the papers on the faculty's behalf. Its digital collections range from
   computer science technical reports to theses and dissertations.

   It is too early to tell what benefits will accrue to each model, but it
   is highly unlikely that any single model will work for all
   institutions. Each institution should consider alternative models in
   light of its particular circumstances.

   Federation for free
   Any institution implementing a repository using one of the software
   solutions described above will automatically expose its metadata to
   harvesting through the Open Archives Initiative Harvesting Protocol.
   This protocol establishes a standard way for metadata about digital
   objects to be crawled (retrieved by software) from any repository that
   complies with the protocol. This harvested metadata can then be indexed
   along with other harvested metadata to provide one-stop searching for
   papers on a particular topic.

   For example, just days after the eScholarship Repository opened in
   April 2002, records for papers in that repository were showing up in
   locations such as the EconPapers site. Meanwhile, a project of the
   University of Michigan called OAIster (say 'oyster') has harvested over
   half a million records for digital resources using the Open Archives
   protocol. A significant number of these records come from institutional
   repositories.

   Economic models
   All of the repositories highlighted here began with support from their
   libraries. How each of these institutional repositories will be
   sustained over time may vary as much as the implementation models, but
   in all cases the long-term economic model is unclear. Will each
   academic institution decide to fund the repository as part of the basic
   infrastructure? Or will it require the library to charge participating
   departments for their use of the infrastructure? Although many active
   in the field expect institutions to fund these services as part of the
   underlying support for the academic enterprise, it is not yet clear
   that university administrations will agree.

   Subject terminologies
   One of the thorniest issues is the lack of a single controlled
   vocabulary for fields of scholarly pursuit. For example, 'medicine' may
   be a perfectly legitimate subject heading for one university, while it
   would be ridiculously broad for a medical school. When searching or
   browsing a specific repository, this may not be much of a problem. But
   as access to institutional repositories becomes federated in central
   portals, it becomes more problematic. How can a user profitably browse
   papers from a variety of repositories that use very different subject
   terminologies?

   Publication and removal
   Since at least some of what is being deposited in institutional
   repositories is 'prepublication,' at least a few will be published in a
   journal. In some cases, faculty may request that their papers be
   removed from the institutional repository. eScholarship allows removal
   of papers, although a citation must always remain. CalTech is more
   conservative in that it disallows removal. If the journal publisher
   does not require the removal of the prepublication version, it may
   still be useful for a reader to discover that a preprint was
   subsequently published by a journal. Putting such information into the
   record in the institutional repository is typically the responsibility
   of whoever deposited the paper originally.
   From grey to black and white

   Although the software and implementation model that an institution
   chooses to employ is still anyone's guess, the likelihood that
   universities and research institutions will implement something is
   increasing. Institutional repositories fill an important void and are
   likely to remain a part of our information landscape. They provide much
   better access to a literature than has ever previously been possible
   and should be a no-brainer for most academic institutions.
     __________________________________________________________________

Link List

   bepress
   [124]bepress.com

   Caltech Digital Collections
   [125]library.caltech.edu/digital

   'The Case for Institutional
   Repositories'
   [126]www.arl.org/sparc/IR/ir.html

   Digital Library of
   the Commons
   [127]dlc.dlib.indiana.edu

   DSpace
   [128]web.mit.edu/dspace/live

   EconPapers
   [129]econpapers.hhs.se

   ePrints
   [130]www.eprints.org

   eScholarship
   [131]escholarship.cdlib.org

   eScholarship Repository
   [132]repositories.cdlib.org
   /escholarship

   OAIster
   [133]oaister.umdl.umich.edu

   Open Archives Initiative
   [134]www.openarchives.org

   What Is Grey Literature?
   [135]www.nyam.org/
   library/greylit/whatis.shtml