Bayes - how big does it get

66 views
Skip to first unread message

Rob Gunther

unread,
Aug 7, 2017, 11:18:02 PM8/7/17
to rsp...@googlegroups.com
Currently, testing Bayes for each domain (just domain, not per user).

I tested with Redis and wow was learning fast.

Then I dropped back to sqlite3, learning is quite a bit slower.

When running on sqlite3 for a few hours it has learned HAM 18,606 messages, for about 1800 domains.  The HAM table has grown to 380M.

That is not very many messages, but it is a lot of disk space.

How does Bayes manage growth?  If we pushed 100,000,000 million messages through the learning system that database will certainly be massive

The machine allocated to rspamd has 16gb of RAM, but only 120gb SSD.

If we are going to use rspamd Bayes for a fairly active mail system, is Redis the recommended route?  We are using Redis for fuzzy storage and following the rspamd setup quickstart guide which suggested limiting redis to 500mb, which we did.

Since we have 16gb RAM on the machine, it is all available to rspamd - can we bump the memory available to Redis way up to 10gb or something to help store all the Bayes data?


Rob

Andrew Lewis

unread,
Aug 8, 2017, 6:26:03 AM8/8/17
to rsp...@googlegroups.com

Hi,

> How does Bayes manage growth? If we pushed 100,000,000 million messages
> through the learning system that database will certainly be massive

It grows forever. There is some preliminary support for expiry but
it's not well tested nor documented.

> If we are going to use rspamd Bayes for a fairly active mail system, is
> Redis the recommended route?

Yes.

> Since we have 16gb RAM on the machine, it is all available to rspamd - can
> we bump the memory available to Redis way up to 10gb or something to help
> store all the Bayes data?

Yes.

Best,
-AL.

Reply all
Reply to author
Forward
0 new messages