Fuzzy vs Bayes

Boden Garman

unread,

Aug 23, 2015, 9:52:54 PM8/23/15

to rspamd

Hi all,

I've set up rspamd and trained the bayes on the following corpus:
http://www.untroubled.org/spam/

Additionally, my users are retraining via moving messages in and out of their Junk folders.

Everything seems to be working well - I was wondering though, when would I train rspamd via fuzzy learning instead of bayes? Maybe I should train both at the same time?

Cheers
Boden

Vsevolod Stakhov

unread,

Aug 24, 2015, 8:25:03 AM8/24/15

to Boden Garman, rspamd

Bayes is classifier, meaning that it can tell if a message belongs to
some specific class (namely, spam or ham). So Bayes task is to split
messages to classes based on theirs content.

Fuzzy hashes, on the contrary, are just non-deterministic matching
algorithm that can tell that a specific message is somehow similar to
messages that have been learnt previously. In my opinion, fuzzy matching
is a more strong and specific factor than any classifier, as it works
with just bad patterns.

Moreover, bayes would require to be learnt with both spam and ham
samples to work efficiently. Otherwise, its output would be highly
biased towards the class which was learnt more than others. For example,
if you learn bayes with spam only its output will be highly shifted to
classify everything as spam (as it has no or very few ham samples).
Hence, you would likely need to provide comparable amount of spam and
ham samples during learning. There is no such a problem for fuzzy
hashes. However, they can match either ham or spam and not both.

--
Vsevolod Stakhov

Boden Garman

unread,

Aug 25, 2015, 8:45:39 PM8/25/15

to rspamd, bpga...@gmail.com

Great info, thanks. What would you suggest for flag and weight if I start using fuzzy based on users retraining?

Reply all

Reply to author

Forward