next up previous contents
Next: Seeding the Database Up: Mail Filters Previous: Database Maintenance   Contents

Learning Spam and Ham

Definitions:

Spam:
Unsolicited commercial email
Ham:
Valid email
False Positive:
A valid email that was erroneously classified as spam
False Negative:
A spam email that was erroneously classified as valid
The role of the Bayesian Classifier is to put incoming email into 3 categories - spam, ham and not-sure (not-sure is a mail that isn't clearly spam or ham and therefore is not auto-learned as either). It does this by breaking incoming mail into tokens. Tokens are mostly words found in the email body but are also elements of the email headers and envelope. It then determines how often these tokens occur in spam and ham (based on what it has been previously taught). With this information it can then add spam points to an incoming email as necessary to enhance the total spam filter's spam detection capability. So the first step is to initialize or seed the Bayes database which has to be done before it can be used. There are two ways in which the admin can teach spam and ham into the Bayes database. One is by uploading spam and ham mbox files and the other is to learn from local users. Visit http://infocenter.guardiandigital.com to get more information about mbox files.


next up previous contents
Next: Seeding the Database Up: Mail Filters Previous: Database Maintenance   Contents
docs@guardiandigital.com 2004-07-09