next up previous contents
Next: Summary Up: Mail Filters Previous: Learning From Local User's   Contents

Maintaining the Database

Now that the database is seeded, it needs to be maintained. This encompasses auto-learning, relearning false positives and false negatives, backup and restores and viewing statistics.

Statistics
The statistics are shown in the Bayes Database Statistics section at the bottom of the web page. They are made up of the number of spam and ham that has been seen by the database since its beginning. It also shows the number of tokens that are currently stored in the database. This number will increase and decrease as the database learns new tokens and expires old tokens. There is also the time stamps of the oldest and newest tokens in the database and the time stamp of the last expiry run.

\includegraphics[%%
scale=0.5]{images/new/mail-123.eps}

Auto-learning
The Bayesian Classifier can automatically categorize incoming email based upon the tokens it sees within the email compared with tokens in the database. In this manner it becomes an adaptive filter automatically learning new spam. This feature is controlled in the General Configuration web page under Spam Configuration, described on page [*].
Maintaining a Balanced Spam/Ham Ratio
In general, it is a good idea to keep the spam and ham counts approximately equal to give the classifier an unbiased point of view. View the spam and ham count statistics . If one gets noticeably higher than the other (somewhere around a 10% to 15% difference) it would be a good idea to adjust the Learning Ham and Learning Spam thresholds to balance the spam and ham counts. It is wise to make small adjustments to these thresholds and watch the counts over a day or two before further adjustments. It is better to see small shifts rather than large swings in the spam/ham ratio.
Learning From User Contributions
You should obtain false positive and false negative messages and feed them into the Bayesian database. This provides another aspect of fine tuning the database (auto-learning being the other one). But as stated above, be extremely cautious on what users you learn from. A poisoned database defeats the purpose of having one.
Rebuilding The Database
This operation rebuilds the database, performing operations such as optimizing token order. It also synchronizes the database journal with the database itself. During auto-learning data is stored in the journal instead of directly in the database. This file gets synchronized on an automatic basis but one could do a manual sync here as well by clicking on the Proceed with Rebuild button. Ordinarily this isn't necessary but could be useful in debugging.

\includegraphics[%%
scale=0.5]{images/new/mail-122.eps}

Forcing An Expiry Run
- This operation forces the Bayes software to take a look at the token database and determine if there are old tokens that are ready for removal. This is done on an automatic basis but can be done manually here by clicking on the Proceed with Expiry button in the Bayes Database Maintenance section of the web page. This could be useful when an admin wants to be sure that the database is up to date. A useful statistic to base such action is the Time of Last Expiry Run. If for some reason Bayes has not done an automatic expiry recently and the admin feels that the elapsed time is more than she likes she can do an expiry run manually. The configuration parameter that has a lot of influence on when this occurs on an automatic basis is the Minimum Database Size in the General Configuration web page under Spam Configuration. With a larger value the expiry runs will tend to be less often and with a smaller value they be more often. A larger database will provide more information for the system to make more accurate decisions but other administrative factors come in to play such as CPU, disk space, speed and available memory.
Clearing The Database
Should it be necessary to clear the database use the Proceed with Clear button in the Bayes Database Maintenance section of the web page. This is a good idea before doing a database restore or when the admin wants to start building the database from a clean slate.
Backups and Restores
This is vital in Bayes database maintenance. Over time a lot of valuable information will be stored in the Bayes database. Should the database become corrupted for some reason you don't want to start all over with seeding it and then having to wait the time it takes to accumulate the number of tokens that make up a mature system again. Create a new Named Backup for /home/vscan/.spamassassin (this is where the database files live) and do daily full backups. Consult the EnGarde documentation on System Backups to get more details. If by chance your database gets corrupted, clear the database described next and then do a normal restore from a recent full backup.


next up previous contents
Next: Summary Up: Mail Filters Previous: Learning From Local User's   Contents
docs@guardiandigital.com 2004-07-09