How to Mirror the ht://Dig Project

ht://Dig Copyright © 1995-2002 The ht://Dig Group
Please see the file COPYING for license information.


Since ht://Dig is gaining popularity, it's good the project gets mirrored. Mirroring is of vital importance for improved availability, reponse time and of course, to save bandwidth for the main server. This document is about how to mirror all or a part of the ht://Dig web and FTP sites. Make sure you read it all.

There are four sites you can mirror:

  1. The ht://Dig web site (main and development)
  2. The ht://Dig files web site (formerly reachable by FTP)
  3. The ht://Dig patch FTP site
  4. The source code development trees (also known as CVS repository)
Note: since november 2001, SourceForge has stopped hosting project FTP services.
Some Words about CVS and Wget

Developing source code with possibly a large number of contributors spread around the world is a tedious task and requires good coordination. This coordination is provided by a piece of software known as Concurrent Versions System (CVS). This is why we use CVS for software development. But the web site is frequently updated by developers too; text is added, changed or deleted, new pages created etc. For the same reasons as developing software, we placed the web site in a CVS repository. More information on CVS can be found in the CVS online documentation. Note: There are a lot of options for use with CVS, which are not explained here. It's merely a short howto on how to set up a local mirror of the ht://Dig project. You should use version 1.10 or higher.

Wget is a software package that enables you create a mirror (an exact copy of) a FTP or web site. It is published under GNU license and should run on virtually every platform. If you don't like Wget, you can try Mirror, which is a very good alternative for mirrorring FTP sites. To learn more on wget, see the Wget web site. You should use version 1.7 or higher.

Alas, SourceForge does not provide an Rsync service and as far as we know, they are not intend to.


Some Notes Before We Begin (in no specific order)


Setting up a Copy of the ht://Dig website
Step 1
You need to have cvs(1) installed on your system. It will not work without. Check out:
man cvs
If your don't have it on your box, ask the administrator to install it for you.

Step 2
You'll need to gain anonymous access at the CVS repository.
cvs -d:pserver:anonymous@cvs.htdig.sourceforge.net:/cvsroot/htdig login
When asked for a password, leave it blank (i.e. press the enter key). You only need to do this once since cvs(1) will create a CVS password file .cvspass in your home directory that will be used in future invocations.

Step 3
Create a directory where you will place your local copy. You can pick any legal name you want, but for the sake of simplicity we'll name the directory htdig. Note that the cvs(1) command later on will create another directory under the just created htdig directory.
cd /home/htdigmirror/www/
mkdir htdig

Step 4
Change to that directory.
cd htdig

Step 5
Check out the htdig module. In newbie-speak: create a local copy of the ht://Dig web site.
cvs -z6 -d:pserver:anonymous@cvs.htdig.sourceforge.net:/cvsroot/htdig \
co -d maindocs maindocs
(Note the backslash at the end of the first line of the command. In Un*x this is the concatenation metacharacter, which means that the two code lines should read as one.) So what does the line of code mean? We are accessing the ht://Dig repository at cvs.htdig.sourceforge.net in directory /cvsroot/htdig via the password server (pserver) and checking out (co) module maindocs using gzip(1) compression (-z6) and place it in a local directory called maindocs. We could have left out the -d maindocs as this is the default. You will see some output on your terminal. The cvs(1) command has created a sub directory named maindocs.

Step 6
Now you need to adapt your web server configuration file. Mabye your need superuser privileges to do that. Don't forget to turn on server side includes (SSI). At the end there's an example configuration file for use with Apache virtual hosts. Start your favorite browser and surf to the web page. It should be something like http://www.htdigmirror.org/htdig/maindocs/. You did restart your web server, didn't you?

Step 7
Ready. You've set up a mirror! Please inform the developers at <htdig-dev@lists.sourceforge.net> about your mirror.

Updating Your Local Copy of the ht:/Dig Web Site
Step 1
You already have cvs(1) otherwise you couldn't have created the initial copy in the first place.

Step 2
You already have anonymous CVS access.

Step 3
Change to the directory that holds the copy of the ht://Dig web site
cd /home/htdigmirror/www/htdig/maindocs
Note that you have to cd(1) to the directory created by cvs(1)!

Step 4
Start the update by executing
cvs -z6 -q update -Pd
You will see rows with updated files. If there's nothing to update, you will see a new command prompt only; there is no output.

Step 5
So you've updated your local copy of the ht://dig web site, but you don't want to do that every day. Solution: Set up a crontab(1) entry to update your mirror every day. Example of an entry:
40 2 * * * cd /home/htdigmirror/www/htdig/maindocs && /usr/bin/cvs \ -z6 -q update -Pd
This will run the command every 2:40 AM. Depending on your version of cron(8), you will get a e-mail message containing the output of the command ran. If you do not want any output, you could use
40 2 * * * (cd /home/htdigmirror/www/htdig/maindocs && /usr/bin/cvs \ -z6 -q update -Pd) >/dev/null 2>&1
It should work on sh(1) oriented shells.

Step 6
Ready. Please inform the developers at <htdig-dev@lists.sourceforge.net> about your mirror.

Setting up a Mirror of the ht://Dig Files Web Site

In the early days (hmm, not so early for that matter), there was an ht://Dig FTP site that housed release tarballs, binaries, snapshots and contributed work. The FTP service is abandoned but fortunately, you can access them via the web; the /files directory. Since it's web access, wget(1) is used for retrieval. Note that you cannot copy the files directory via cvs and that there is no real files directory on the ht://Dig Web Site.

Step 1
Make sure you have installed wget(1) and read the documentation. We're going to place the copy of the files into the anonymous FTP area. That way, you can have people access the files by FTP and by web.

Step 2
Change to the public directory of the anonymous FTP area and create a sub directory for holding the files.
cd /home/ftp/pub
mkdir ftp.htdig.org
cd ftp.htdig.org
The files will be placed in the files directory (as you will see shortly).

Step 3
Copy the files.
wget -nv -m -np -nH -p http://www.htdig.org/files
The -nv will turn off verbose output, but it will not be very quiet. The -m option tells wget(1) to turn on options suitable for mirroring (as in -r -N -l inf -nr). The -np (no-parent) option will prevent ascending to the parent directory. The option -nH disables generation of host-prefixed directories, so you will not get a directory called www.htdig.org. And last, -p causes wget(1) to download all the files that are necessary to properly display a given HTML page.
This combination of options will result in a directory called files that holds a copy of http://www.htdig.org/files/

Step 4
There is one drawback; since the files directory at the ht://Dig web site holds no index.html file, the web server over there will create one on-the-fly. Even more, this generated file creates links to itself that will display the files in all kinds of sort order like name, last modified, size and description. (These links look like ?N=D, ?M=A etc.) We will have to remove them as they contain links calculated for the ht://Dig web site and will probably not match your copy. Also, the sites' robots.txt file is copied. We don't need it eighter. So we do
rm -f robots.txt
cd files
find . -name index.html -print -exec rm -f {} \;
find . -name "*=*" -print -exec rm -f {} \;
(Note the double quotes round *=*!.) This will traverse the files directory and delete any index.html or ?N=D-like files. As a bonus, it will print out the files deleted.

Step 5
Create an Alias in your web server configuration file so that this mirror can be accessed by web. We'll give an Alias line for Apache:
Alias /htdig/maindocs/files "/home/ftp/pub/ftp.htdig.org/files"
Files should now be accessible via http://www.htdigmirror.org/htdig/maindocs/files/. Don't forget to turn on directory indexing!

Step 6
Ready. Please inform the developers at <htdig-dev@lists.sourceforge.net> about your mirror.

Updating your mirror of the ht://Dig Files Web Site

This is very simple. Repeat step 2, 3 and 4 of "Setting up a Mirror of the ht://Dig Files Web Site". But there is one drawback (again). As wget(1) uses the links from the generated index.html files as pointers to other files to be fetched, it will leave files that are not in that index.html untouched. As a result, within a few weeks you will have a lot of snapshot files on your mirror and you'll need to remove them by hand.


Setting up a Mirror of the ht://Dig Patch Site

Making a mirror of the patch site involves the use of wget(1) and is similar to creating a copy of the files web site.

Step 1
Create a suitable direoctory in the anonymous FTP area.
cd /home/ftp/pub
mkdir ftp.ccsf.org
cd ftp.ccsf.org
Here we will place the files.

Step 2
Copy the files.
wget -nv -m -np -nH -p ftp://ftp.ccsf.org/htdig-patches
This combination of options will result in a directory called htdig-patches that holds a copy of ftp://ftp.ccsf.org/htdig-patches

Step 3
For some reason, wget(1) leaves .listing files behind in your copy. Although they don't do any harm, it's nice to get rid of them.
find . -name .listing -print -exec rm -f {} \;
This will traverse the htdig-patches directory and delete any .listing files. It will also delete the one in the top directory.

Step 4
Create an Alias in your web server configuration file so that this mirror can be accessed by web. We'll give an Alias line for Apache:
Alias /htdig/maindocs/htdig-patches "/home/ftp/pub/ftp.ccsf.org/htdig-patches"
Files should now be accessible via http://www.htdigmirror.org/htdig/maindocs/htdig-patches/. Don't forget to turn on directory indexing!

Step 6
Ready. Please inform the developers at <htdig-dev@lists.sourceforge.net> about your mirror.

Updating your mirror of the ht://Dig Patch Web Site

This is very simple. Repeat step 1, 2 and 3 of "Setting up a Mirror of the ht://Dig Patch Site".


Setting up a Copy of the Source Code Development Trees (CVS Repository)

There are two trees you can checkout: the 3.1 branch and 3.2beta branch. Because you know the procedure now, we'll just give the commands.

cd /home/htdigmirror/www/htdig
cvs -z6 -d:pserver:anonymous@cvs.htdig.sourceforge.net:/cvsroot/htdig \
co -d htdig-3-1-x -r htdig-3-1-x htdig
cvs -z6 -d:pserver:anonymous@cvs.htdig.sourceforge.net:/cvsroot/htdig \
co -d htdig-3-2-x -r htdig-3-2-x htdig

(the -r option will checkout a specific revision). You'll get two subdirectories named htdig-3-1-x and htdig-3-2-x containing the CVS trees. The trees can be accessed through web via http://www.htdigmirror.org/htdig/htdig-3-1-x/ and http://www.htdigmirror.org/htdig/htdig-3-2-x/ respectively. You'll need to adapt your web server configuration file so that it will show directory indexes. See examples at the end.

Note: If you leave out the -r option, you will check out the main branch of the htdig source tree, but this branch has been largely untouched since February 2000. You must use the -r option.

Note:There is currently no link from the ht://Dig Web Site to the CVS trees, so people cannot access it by your mirror. You have to tell them otherwise.


Updating the Copy of the Source Code Development Trees

Well, this just like updating the ht://Dig web site. You'll need to goto the right directory and issue the cvs(1) update command.


E-mail addresses

Further conciderations

One can use wget(1) to copy the ht://Dig main web site, althought cvs(1) is surely efficiently and faster. If you decide to copy the ht://Dig main web site with wget(1), note that you will make a copy of the /files directory automaticly. The /files directory is currently about 70 MB.

Example Configuration Files

Apache 1.3.x example configuration file (part) for ht://Dig mirror sites. For use with virtual hosts:

# Host www.htdigmirror.org
<VirtualHost 1.2.3.4>
        ServerAdmin webmaster@htdigmirror.org
        ServerName www.htdigmirror.org
        DocumentRoot /home/htdigmirror/www
        ErrorLog /home/htdigmirror/etc/error_log
        TransferLog /home/htdigmirror/etc/access_log
        
        # Aliasing files directory to web site and activate fancy directory indexing
        Alias /htdig/maindocs/files "/home/ftp/pub/ftp.htdig.org/files"
        <Directory /home/ftp/pub/ftp.htdig.org/files>
                Options Indexes
        </Directory>

        # Aliasing patch directory to web site and activate fancy directory indexing
        Alias /htdig/maindocs/htdig-patches "/home/ftp/pub/ftp.ccsf.org/htdig-patches"
        <Directory /home/ftp/pub/ftp.ccsf.org/htdig-patches>
                Options Indexes
        </Directory>
        
        # Activate Server Side Includes w/o Execute
        <Directory /home/htdigmirror/www/htdig/maindocs>
                Options IncludesNOEXEC
        </Directory>
        
        # Activate fancy directory indexing for browsing CVS tree
        <Directory /home/htdigmirror/www/htdig/htdig-3-1-x>
                Options Indexes
        </Directory>
        <Directory /home/htdigmirror/www/htdig/htdig-3-2-x>
                Options Indexes
        </Directory>
</VirtualHost>

Last modified: $Date: 2002/01/31 23:37:04 $