next up previous contents
Next: Re-Learning Email Up: Mail Filters Previous: Learning Spam and Ham   Contents


Seeding the Database

The Bayes Classifier won't even start running until it has learned a minimum 200 spam and 200 ham emails. This means that ham is just as important as spam and an equal balance is needed for optimal performance. Seeding requires the preliminary collection at least 200 known spam and 200 known ham messages. Feel free to seed the database with larger amounts of spam and ham (in approximately equal amounts of both). The more samples it is seeded with the better its initial performance will be. Store all of the spam in one file in the mbox style format. Do the same with all of the ham. Spam and ham needs to be put in separate files before being fed to Bayes.

Once that is done and these files have been transferred to the machine where the admin is running the WebTool from she can upload these files onto the machine that is running the Secure Mail Suite spam filter using the Bayesian Classifier. This is done in the Upload Ham/Spam Mailbox section of the WebTool page mentioned above. There is a Browse button which allows the admin to upload the spam and ham files separately. Choose one of three upload options, Upload as SPAM, Upload HAM or Forget Message/mbox. (The Forget option will be discussed later). After making the proper choice click on the Proceed With Upload button. Do this for both the spam and ham mbox files.

\includegraphics[%%
scale=0.5]{images/new/mail-120.eps}

This will generally take from a couple of seconds to a minute or so depending on the file size. An easy way to verify if that the files were successfully learned is by observing the Bayes Database Statistics section at the bottom of the page. Click on your browser's Reload button to ensure that the web page has been updated. You will see the some database statistics including the number of spam and ham emails that it has learned. Once these values are greater than 200 for both spam and ham the database can be used to classify and auto-learn incoming mail. (The auto-learn feature is described on page [*] of this guide).


next up previous contents
Next: Re-Learning Email Up: Mail Filters Previous: Learning Spam and Ham   Contents
docs@guardiandigital.com 2004-07-09