next up previous contents
Next: Distributed Checksum Clearinghouse (DCC) Up: Mail Filters Previous: Thresholds   Contents

Bayesian Configuration

Bayesian Classification is a method by which the spam scanning system learns about what is considered spam and what is not. It works by keeping a database that contains the probability that a message containing a particular word is spam. When it scans a new message, the Bayesian filter employes a heuristic method to calculate the probability the message is spam, from the individual probabilities of the words in the message. Since the Bayesian filter solely depends on the information it has learned from the previous messages, it is very important to keep the Bayesian database updated by constantly teaching it using spam and non-spam messages. Bayesian filtering has a very significant effect on the efficiency of the spam scanning subsystem.

\includegraphics[%%
scale=0.5]{images/new/mail-118.eps}

Bayesian Classifying
You can enable or disable Bayesian Classifying here. It is highly recommended that you enable this option. Enabling Bayes Classifying can drastically improve the performance of spam filtering.
Bayesian Auto Learning
The Bayesian filter will learn automatically from messages passing through the filter, once it is manually seeded with a minimum of 200 ham and 200 spam messages. Manually seeding the Bayesian database is discussed in the Seeding Bayes Database section, in the Bayesian Learning Center on page [*]. Since it needs no human intervention afterwords, it is a very convenient way to train the Bayesian filter. It is recommended that this option is enabled.
Learning Ham Threshold
This threshold is used to determine if a message should be learned by the Bayesian filter as a legitimate message (ham). If the spam score of a message is less than this threshold, the Bayesian filter will learn this message as a legitimate message. This score should be a very low number, close to zero, to make absolutely sure this message is legitimate and the Bayesian filter doesn't learn any spam messages as ham.
Learning Spam Threshold
This threshold is used to determine if a message should be learned by the Bayesian filter as a spam message. If the score calculated from the message is greater than the value specified here, the message will be learned as spam. This number should be set to a high value to make absolutely sure that the message is indeed spam. Setting this to a low value may result in some legitimate mail getting learned as spam, which will adversely affect the efficiency of the spam scanner.
Bayes Ignore Headers
Here you can enter the mail headers that the Bayesian filter will not learn. If the received mail is already filtered by another mail system, like a spam filtering ISP, or mailing list, they may add certain headers in the message. These headers may provide unnecessary clues to the Bayesian filter when it learns those messages, which may result in the filter developing a tendency to give more importance to these headers than the contents of the message.
Eg: X-Spam-Status


next up previous contents
Next: Distributed Checksum Clearinghouse (DCC) Up: Mail Filters Previous: Thresholds   Contents
docs@guardiandigital.com 2004-07-09