Next: Seeding the Database
Up: Mail Filters
Previous: Database Maintenance
  Contents
Definitions:
- Spam:
- Unsolicited commercial email
- Ham:
- Valid email
- False Positive:
- A valid email that was erroneously classified as
spam
- False Negative:
- A spam email that was erroneously classified as
valid
The role of the Bayesian Classifier is to put incoming email
into 3 categories - spam, ham and not-sure (not-sure
is a mail that isn't clearly spam or ham and therefore is not auto-learned
as either). It does this by breaking incoming mail into tokens.
Tokens are mostly words found in the email body but are also elements
of the email headers and envelope. It then determines how often these
tokens occur in spam and ham (based on what it has been previously
taught). With this information it can then add spam points to an incoming
email as necessary to enhance the total spam filter's spam detection
capability. So the first step is to initialize or seed the
Bayes database which has to be done before it can be used. There are
two ways in which the admin can teach spam and ham into the Bayes
database. One is by uploading spam and ham mbox files and the
other is to learn from local users. Visit http://infocenter.guardiandigital.com
to get more information about mbox files.
- [NOTE:]Learning from local users can only
be done on an email server where the recipients have local accounts
on the server. This is the difference between gateway operation and
a server that stores email. Only the storage server can be used for
this function.
Next: Seeding the Database
Up: Mail Filters
Previous: Database Maintenance
  Contents
docs@guardiandigital.com
2004-07-09