best practice for training

487 views
Skip to first unread message

Ismail Yenigul

unread,
Aug 6, 2016, 11:56:01 AM8/6/16
to rspamd
Hi,

What is the best practices for rspamd  bayes training? What is the number of ham/spam messages for higher detection and lower FP?
And when should we add message to  fuzzy storage?

Thanks

Ismail Yenigul

unread,
Aug 6, 2016, 12:05:43 PM8/6/16
to rspamd
Hi,

What is the meaning of the following HTTP error.
I train these file first time but I got "already in class spam"?

Results for file: 549415248b56f.msg (0.008 seconds)
HTTP error: 400, <5017741560.B...@hwruumffju.jrnfgrxaskxe.info> contains less tokens than required for bayes classifier: 10 < 11


Results for file: 549811c4a78f1.msg (0.011 seconds)
HTTP error: 410, <CAHEpWciPLXmkSN-Z9O5woY+x...@mail.gmail.com> is skipped for bayes classifier: already in class spam; probability 99.92


6 Ağustos 2016 Cumartesi 18:56:01 UTC+3 tarihinde Ismail Yenigul yazdı:

Andrew Lewis

unread,
Aug 6, 2016, 12:19:53 PM8/6/16
to rsp...@googlegroups.com
Hi Ismail,

> What is the number of ham/spam messages for higher detection and lower FP?

More training should generally make for higher detection rates.
Roughly equivalent number of ham/spam trains should make for best
accuracy (YMMV). How much training is needed for acceptable results is
largely dependent on your environment (single-user environments are
least demanding; larger environments with more varied mail flow will
require more training). Default behaviour is not to try classify
messages until 200 learns are performed (it's a configurable setting).

> And when should we add message to fuzzy storage?

Fuzzy is useful for matching attachments & nearly-identical messages.

> contains less tokens than required for bayes classifier: 10 < 11

Message is too short to be classified by bayes. This is governed by
configuration - you can reduce number of required tokens but accuracy
will suffer.

> skipped for bayes classifier: already in class spam; probability 99.92

Message already has high spam probability so training was skipped to
avoid over-training.

Best,
-AL.

Ismail Yenigul

unread,
Aug 6, 2016, 4:26:48 PM8/6/16
to rspamd
Thanks Andrew.


Btw WebUI shows total learned messages. It should be great if we can see ham and spam learned messages separately ;)

Best regards

6 Ağustos 2016 Cumartesi 19:19:53 UTC+3 tarihinde Andrew Lewis yazdı:

Ismail Yenigul

unread,
Aug 6, 2016, 5:13:20 PM8/6/16
to rspamd
> contains less tokens than required for bayes classifier: 10 < 11

>Message is too short to be classified by bayes. This is governed by  
>configuration - you can reduce number of required tokens but accuracy  
>will suffer.

The following is the body of the spam messages and rejected by rspamd just because of contain 6 tokens (6 < 11)
Does rspamd bayes work for body only? What about headers?

"The greatest method to satisfy her http://metro.xn--80ahidb2ae4b1a6cyamf.xn--p1ai/"


6 Ağustos 2016 Cumartesi 23:26:48 UTC+3 tarihinde Ismail Yenigul yazdı:

Ismail Yenigul

unread,
Aug 8, 2016, 6:02:56 AM8/8/16
to rspamd
Hi

I just train a brand new spam mail on WebUI but I got the following error:
I am "almost" pretty sure that I did not train it before. How does rspamd check whether the email trained before or not?  Does it check  Message-ID?

Error: [error] <20160808043553....@webmail.ubix.com.ph> is skipped for bayes classifier: already in class spam; probability 99.87

7 Ağustos 2016 Pazar 00:13:20 UTC+3 tarihinde Ismail Yenigul yazdı:
Reply all
Reply to author
Forward
0 new messages