Next: Summary
Up: Mail Filters
Previous: Learning From Local User's
  Contents
Now that the database is seeded, it needs to be maintained. This encompasses
auto-learning, relearning false positives and false negatives, backup
and restores and viewing statistics.
- Statistics
- The statistics are shown in the Bayes Database
Statistics section at the bottom of the web page. They are made up
of the number of spam and ham that has been seen by the database since
its beginning. It also shows the number of tokens that are currently
stored in the database. This number will increase and decrease as
the database learns new tokens and expires old tokens. There is also
the time stamps of the oldest and newest tokens in the database and
the time stamp of the last expiry run.
- [NOTE:]You may experience learning a number
of spam or ham and not seeing the expected increase in database statistics.
This is most likely due to the fact that the Bayes Classifier
has already learned some of the email that you are feeding
it. When this happens the spam or ham counts will only be incremented
by the amounts of new email.
- Auto-learning
- The Bayesian
Classifier can automatically categorize incoming email based upon
the tokens it sees within the email compared with tokens in the database.
In this manner it becomes an adaptive filter automatically learning
new spam. This feature is controlled in the General Configuration
web page under Spam Configuration, described on page
.
- Maintaining a Balanced Spam/Ham Ratio
- In general, it is a good
idea to keep the spam and ham counts approximately equal to give the
classifier an unbiased point of view. View the spam and ham count
statistics . If one gets noticeably higher than the other (somewhere
around a 10% to 15% difference) it would be a good idea to adjust
the Learning Ham and Learning Spam thresholds to balance
the spam and ham counts. It is wise to make small adjustments to these
thresholds and watch the counts over a day or two before further adjustments.
It is better to see small shifts rather than large swings in the spam/ham
ratio.
- Learning From User Contributions
- You should obtain false positive
and false negative messages and feed them into the Bayesian database.
This provides another aspect of fine tuning the database (auto-learning
being the other one). But as stated above, be extremely cautious on
what users you learn from. A poisoned database defeats the purpose
of having one.
- Rebuilding The Database
- This
operation rebuilds the database, performing operations such as optimizing
token order. It also synchronizes the database journal with the database
itself. During auto-learning data is stored in the journal instead
of directly in the database. This file gets synchronized on an automatic
basis but one could do a manual sync here as well by clicking on the
Proceed with Rebuild button. Ordinarily this isn't necessary
but could be useful in debugging.
- Forcing An Expiry Run
- -
This operation forces the Bayes software to take a look at the token
database and determine if there are old tokens that are ready for
removal. This is done on an automatic basis but can be done manually
here by clicking on the Proceed with Expiry button in the Bayes
Database Maintenance section of the web page. This could be useful
when an admin wants to be sure that the database is up to date. A
useful statistic to base such action is the Time of Last Expiry
Run. If for some reason Bayes has not done an automatic expiry recently
and the admin feels that the elapsed time is more than she likes she
can do an expiry run manually. The configuration parameter that has
a lot of influence on when this occurs on an automatic basis is the
Minimum Database Size in the General Configuration web
page under Spam Configuration. With a larger value the expiry
runs will tend to be less often and with a smaller value they be more
often. A larger database will provide more information for the system
to make more accurate decisions but other administrative factors come
in to play such as CPU, disk space, speed and available memory.
- Clearing The Database
- Should it be necessary to clear the database
use the Proceed with Clear button in the Bayes Database
Maintenance section of the web page. This is a good idea before doing
a database restore or when the admin wants to start building the database
from a clean slate.
- Backups and Restores
- This is vital in Bayes database maintenance.
Over time a lot of valuable information will be stored in the Bayes
database. Should the database become corrupted for some reason you
don't want to start all over with seeding it and then having to wait
the time it takes to accumulate the number of tokens that make up
a mature system again. Create a new Named Backup for /home/vscan/.spamassassin
(this is where the database files live) and do daily full backups.
Consult the EnGarde documentation on System Backups to get
more details. If by chance your database gets corrupted, clear the
database described next and then do a normal restore from a recent
full backup.
Next: Summary
Up: Mail Filters
Previous: Learning From Local User's
  Contents
docs@guardiandigital.com
2004-07-09