These classnotes are depreciated. As of 2005, I no longer teach the classes. Notes will remain online for legacy purposes

UNIX03/Set Up Detection For Defaced Web Pages

Classnotes | UNIX03 | RecentChanges | Preferences

The book on page 661 discusses means by which you can detect when web pages become defaced automatically. We do not actually need to use the book's code, however, as there is another alternative.

Introduction to noink2

noink was a Free-Software project I began in 1998. It was a single, large web-application written in Perl that provided many different functionalities (such as forums, dynamic content generation, some security and protection, etc).

As we neared our first stable release (version 1.0), my group and I realized that noink was too unweildy to maintain because of its monolithic design. Thus, for our 2.0 development line, we decided to break up noink's tasks into smaller, individual modules.

noink2 will be the conclusion to this redesign, however, noink2 is stil under active development. While certain elements of noink2 are very functional and could be considered "stable" and "complete" (such as noWiki, which powers this web-site) others are not ready for production use (such as the P2P version of noNews).

noWWWatch

noWWWatch is a script which can be used to monitor web-pages for changes. You provide it a list of web-pages to scan, set noWWWatch to run (usually via cron), and noWWWatch scans those pages for changes.

noWWWatch reads each web-page and computes an MD5 hash based upon the web-page's contents. It then compares this hash with a hash from the previous scan. But only comparing hashes, noWWWatch can be quite quick and will only take up as much disk space as needed to store the hashes for every page it scans.

Once you've obtained noWWWatch, you'll see it consists of a number of files:


nowwwatch.pl
This is the main Perl script executable. It should be executable and be placed in a directory such as /usr/bin/. Unless you wish to change the default location for noWWWatch's configuration files, you will not need to modify this script.


nowwwatch-cfg.pl
This is the noWWWatch configuration file. By default, it should be placed in /etc/noink2/nowwwatch-cfg.pl. This configuration script is actually a Perl script (similar to the configuration file for Amavis). At present, it only contains a small number of options:

 $check_dir = ".";

This first option tells noWWWatch where to find it's sites list file and where to place the MD5 file. You will want to change this from the default.

 $deliminator = '>';

This option is unused at this time, but will provide a field deliminator in the future for more fine control of noWWWatch's features.

 $main_administrator = 'hart@physics.arizona.edu';

This option tells noWWWatch who should get it's reports. Presently, noWWWatch sends reports regardless of whether there was an error or not for debugging purposes. In the future, this behaviour will be changed to only send error reports.

 $contact_main_admin = "yes";    # Uncomment if you do want main
                 #admin contacted
 # undef $contact_main_admin     # Uncomment if you do not want
                 #main admin contacted

This feature is not yet used. In the future, you will be able to specify alternative contacts to the main administrator and will be able to finely tune who gets what information (bug reports, completion reports, etc.) However, that functionality is not yet present.


sites.lst
You place this file into the $check_dir location specify above. Edit the file to contain a list of the sites you wish noWWWatch to check.

Place whatever sites you wish in this file for the example, but add a link to the main page for this class: http://www.geekcomix.com/cgi-bin/classnotes/wiki.pl?UNIX03

By the end of the day, I will modify this link to show you how an error report looks.

Usage

Once you have installed nowwwatch.pl and nowwwatch-cfg.pl, you need to be sure that nowwwatch.pl can read/write from/to the $check_dir setting. The recommended way to do this is to have nowwwatch.pl run as a non-priviledged user, and have that user setup to run nowwwatch.pl from their cronjob.

Add a user called "nowwwatch" with a group "noink2". Set this user's home directory to /home/noink2/nowwwatch. Make this home directory readable and writable by nowwwatch, but only readable by the group noink2. Inside of this directory, place your sites.lst file.

su to nowwwatch and add /usr/bin/nowwwatch.pl to its cronjob to run every night at midnight.

Once you have done this, run nowwwatch.pl manually a couple of times to verify it is working.

NOTE: noWWWatch is still under development. While it is stable enough to consider use in production, it is not yet feature complete. Please watch the noink2 web-site (http://sf.net/projects/noink/) for changes in the noWWWatch code.

ALSO NOTE: The ideal way to run noWWWatch is to have it on a server separate from the web-server(s) it is checking. If noWWWatch was on the same server as the web-server, then a compromise in the web-server could lead to a compromise of noWWWatch.

Get the noWWWatch package here:



Classnotes | UNIX03 | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited June 14, 2003 1:00 pm (diff)
Search:
(C) Copyright 2003 Samuel Hart
Creative Commons License
This work is licensed under a Creative Commons License.