1.0 Introduction
The CEP Project (a.k.a. PEP, Preserving Electronic Publications) is a web site archiving system developed with Open Source software for Unix/Linux. CEP makes it possible for organizations to periodically download and retain archival copies of their evolving web site(s). CEP uses a web spider, wget, to traverse and download a target website's pages and CVS to archive the pages and their subsequent changes. CEP also uses a variety of software packages to create, maintain historical data and provides summary statistics about the web site's content.
This document addresses only the installation of the constituent software units onto the CEP host computer. A companion document, the CEP Operations Guide, addresses ongoing operator actions in control and of the CEP system, and how its harvested materials are managed and utilized.
The packages used to create CEP include: Fedora, Apache, CVS, PERL, GD graphic tools, TreeTagger, CVS ChangeLogBuilder, XMLFile OAI-PMH Data Provider, and wget. Many of these packages are available from an installation of Redhat/Fedora, a few need to be downloaded from the individual sites and many are available from http://rpmfind.net/.
CEP integrates these stand-alone packages into a single system through the use of CGI, Perl and Java processes. The CEP system uses the wget web spider to retrieve web pages from a target web site defined by the XML configuration file, which get defined through a CEP provided web page. After a site is retrieved, other processes generate Meta-data, statistical data and then the web site data is presented to the operator for manual or automatic check-in to the CVS archive. An overview of the data flow is represented in Figure 1.
Figure 1 - CEP Data Flow
Click to enlarge.
CEP was developed by Larry S. Jackson from the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign (UIUC) with funding from Institute for Museum and Library Services (IMLS) National Leadership Grants, and from the Illinois State Library (ISL). We encourage you to visit the CEP home page for all of the details.
This guide provides information about CEP installation, configuration, and frequently asked questions. This guide walks your through a CEP installation on a default Fedora 3 system. Deviations from the default install will require the user to adjust paths and configuration files to match the local system.
