The CEP software system provides facilities to archive groups of potentially very large websites, such as are encountered in state government agencies. Web harvesting is performed by spiders, and complete version histories are kept for all websites so processed. A variety of statistical products useful for management of both websites and web archives are produced. The system is released under open-source license, and is built using other open source systems.
This software has been developed by the Graduate School of Library and Information Science (GSLIS) at the University of Illinois, Urbana-Champaign (UIUC) under funding from Institute for Museum and Library Services (IMLS) National Leadership Grants, and from the Illinois State Library (ISL).
This guide reflects CEP software version 1.1 of 6 March 2007. A companion document, the CEP Installation Guide, details the installation of the associated open-source modules and the initial installation of the CEP software itself. This document picks up from there and describes the configuration of CEP and the configuration and operation of all the archiving spiders.
The Electronic Archives Project (EAP) homepage at GSLIS is http://www.isrl.uiuc.edu/pep/, and includes materials on the associated Illinois Government Information search engine and the Illinois Electronic Documents Initiative permanent digital library for "born digital" Illinois State Government publications. Larry S. Jackson is the EAP Principal Investigator at GSLIS.
