Next: Distributed Checksum Clearinghouse (DCC)
Up: Mail Filters
Previous: Thresholds
  Contents
Bayesian Classification is a method by which the spam scanning
system learns about what is considered spam and what is not. It works
by keeping a database that contains the probability that a message
containing a particular word is spam. When it scans a new message,
the Bayesian filter employes a heuristic method to calculate the probability
the message is spam, from the individual probabilities of the words
in the message. Since the Bayesian filter solely depends on the information
it has learned from the previous messages, it is very important to
keep the Bayesian database updated by constantly teaching it using
spam and non-spam messages. Bayesian filtering has a very significant
effect on the efficiency of the spam scanning subsystem.
- Bayesian Classifying
- You can enable or disable Bayesian Classifying
here. It is highly recommended that you enable this option. Enabling
Bayes Classifying can drastically improve the performance of
spam filtering.
- Bayesian Auto Learning
- The
Bayesian filter will learn automatically from messages passing through
the filter, once it is manually seeded with a minimum of 200 ham and
200 spam messages. Manually seeding the Bayesian database is discussed
in the Seeding Bayes Database section, in the Bayesian
Learning Center on page
. Since it
needs no human intervention afterwords, it is a very convenient way
to train the Bayesian filter. It is recommended that this option is
enabled.
- Learning Ham Threshold
- This threshold is used to determine if
a message should be learned by the Bayesian filter as a legitimate
message (ham). If the spam score of a message is less than this threshold,
the Bayesian filter will learn this message as a legitimate message.
This score should be a very low number, close to zero, to make absolutely
sure this message is legitimate and the Bayesian filter doesn't learn
any spam messages as ham.
- Learning Spam Threshold
- This threshold is used to determine if
a message should be learned by the Bayesian filter as a spam message.
If the score calculated from the message is greater than the value
specified here, the message will be learned as spam. This number should
be set to a high value to make absolutely sure that the message is
indeed spam. Setting this to a low value may result in some legitimate
mail getting learned as spam, which will adversely affect the efficiency
of the spam scanner.
- Bayes Ignore Headers
- Here you can enter the mail headers that
the Bayesian filter will not learn. If the received mail is already
filtered by another mail system, like a spam filtering ISP, or mailing
list, they may add certain headers in the message. These headers may
provide unnecessary clues to the Bayesian filter when it learns those
messages, which may result in the filter developing a tendency to
give more importance to these headers than the contents of the message.
Eg: X-Spam-Status
Next: Distributed Checksum Clearinghouse (DCC)
Up: Mail Filters
Previous: Thresholds
  Contents
docs@guardiandigital.com
2004-07-09