No Bayes?

310 views
Skip to first unread message

marc perkel

unread,
Dec 14, 2015, 1:55:01 PM12/14/15
to rspamd
I'm new to rsyncd - I'm getting this.

rspamd_mmaped_file_open: cannot stat file /var/lib/rspamd/bayes.spam, error No such file or directory, 2

Also - different topic. Ever think of using redis to stare bayes?

Andrew Lewis

unread,
Dec 14, 2015, 2:12:22 PM12/14/15
to rsp...@googlegroups.com
Hi Marc,

> I'm new to rsyncd - I'm getting this.
> rspamd_mmaped_file_open: cannot stat file /var/lib/rspamd/bayes.spam, error
> No such file or directory, 2

That's expected - the files will be created on training. You could use
the stock statistics as a starting point:
https://rspamd.com/doc/quickstart.html#pre-built-statistics

> Also - different topic. Ever think of using redis to stare bayes?

It was considered but seems unlikely to be worked on. Current
statistics implementation supports multiple backends but probably
doesn't lend itself particularly well towards using simple KV storage.

Best,
-AL.

Vsevolod Stakhov

unread,
Dec 14, 2015, 3:44:26 PM12/14/15
to marc perkel, rspamd
On 14/12/2015 18:55, marc perkel wrote:
> I'm new to rsyncd - I'm getting this.
>
> rspamd_mmaped_file_open: cannot stat file /var/lib/rspamd/bayes.spam,
> error No such file or directory, 2

That shouldn't happen with the default configuration on 1.0+

> Also - different topic. Ever think of using redis to stare bayes?

Well, there was a project of making redis backend for rspamd, however, I
couldn't find any *reasonable* answer to the following question: "Why do
you need redis backend counting that you can just learn all sqlite
backends on all servers all together?" That's simple, reliable and
scalable. Rspamd has also relearn protection so faulted servers should
not be an issue as well.

--
Vsevolod Stakhov

Marc Perkel

unread,
Dec 14, 2015, 4:25:13 PM12/14/15
to Vsevolod Stakhov, rspamd

On 12/14/15 12:44, Vsevolod Stakhov wrote:
> On 14/12/2015 18:55, marc perkel wrote:
>> I'm new to rsyncd - I'm getting this.
>>
>> rspamd_mmaped_file_open: cannot stat file /var/lib/rspamd/bayes.spam,
>> error No such file or directory, 2
> That shouldn't happen with the default configuration on 1.0+

Current Centos version < 1.0

>
>> Also - different topic. Ever think of using redis to stare bayes?
> Well, there was a project of making redis backend for rspamd, however, I
> couldn't find any *reasonable* answer to the following question: "Why do
> you need redis backend counting that you can just learn all sqlite
> backends on all servers all together?" That's simple, reliable and
> scalable. Rspamd has also relearn protection so faulted servers should
> not be an issue as well.
>

Not sure I understand how many sqlite servers learn together?

I'm also running Spamassassin that has redis backend. The first one that
actually worked for me.

I have 4 SA servers and a 5th server running redis for all 4 SA servers
so they learn together. So I'm really happy with redis right now.



Vsevolod Stakhov

unread,
Dec 14, 2015, 7:58:42 PM12/14/15
to Marc Perkel, rspamd
On 14/12/2015 21:25, Marc Perkel wrote:
>
> On 12/14/15 12:44, Vsevolod Stakhov wrote:
>> On 14/12/2015 18:55, marc perkel wrote:
>>> I'm new to rsyncd - I'm getting this.
>>>
>>> rspamd_mmaped_file_open: cannot stat file /var/lib/rspamd/bayes.spam,
>>> error No such file or directory, 2
>> That shouldn't happen with the default configuration on 1.0+
>
> Current Centos version < 1.0
>
>>
>>> Also - different topic. Ever think of using redis to stare bayes?
>> Well, there was a project of making redis backend for rspamd, however, I
>> couldn't find any *reasonable* answer to the following question: "Why do
>> you need redis backend counting that you can just learn all sqlite
>> backends on all servers all together?" That's simple, reliable and
>> scalable. Rspamd has also relearn protection so faulted servers should
>> not be an issue as well.
>>
>
> Not sure I understand how many sqlite servers learn together?

rspamc -h srv1 learn_spam msg.eml ;
rspamc -h srv2 learn_spam msg.eml ;
rspamc -h srv3 learn_spam msg.eml ;
rspamc -h srv4 learn_spam msg.eml ;

> I'm also running Spamassassin that has redis backend. The first one that
> actually worked for me.
>
> I have 4 SA servers and a 5th server running redis for all 4 SA servers
> so they learn together. So I'm really happy with redis right now.

The major benefit of rspamd is that you basically won't need 4 servers
for scanning. A single server can scan hundreds of messages per second
(even when loading the vanilla SA rules).

Nevertheless, I don't see any benefit of having a dedicated server for
redis. I can think about creating of UDP service like I did for fuzzy
storage. As in this case, I would have encryption and guaranteed high
rate of tokens being scanned. In case of redis, I don't see anything of
that.

--
Vsevolod Stakhov

Marc Perkel

unread,
Dec 14, 2015, 8:15:42 PM12/14/15
to Vsevolod Stakhov, rspamd
Actually it's a dedicated virtual server which is easy running OpenVZ.
But - with spamassassin - it allows me to have a common bayes store. The
4 SA servers are separate from my many EXIM servers that use them on a
load balanced setup.

I filter about 5000 domains.

Thanks for your help.

Vsevolod Stakhov

unread,
Dec 15, 2015, 10:06:45 AM12/15/15
to Marc Perkel, rspamd
The question is what are you trying to save by this? Disk space, right?
Anyway, I'm not a big fan of querying unencrypted requests in the
network, so I won't like redis backend solution. However, I'm thinking
about central storage like I did for fuzzy. That might be doable I
suppose...

> I filter about 5000 domains.

The only question is the amount of messages scanned per second in the
peak times.

--
Vsevolod Stakhov

Marc Perkel

unread,
Dec 15, 2015, 11:49:55 AM12/15/15
to Vsevolod Stakhov, rspamd
Disk space isn't the issue. What I want is a common store of bayes data
for all servers so that a spam learned on one server affects all servers.

The reason I suggested redis is because SpamAssassin used to use MySQL
and it didn't work in high volume. But when they went to redis it all
worked perfectly.

>
>> I filter about 5000 domains.
> The only question is the amount of messages scanned per second in the
> peak times.
>

Actually most of my spam filtering is done with Exim rules. Only maybe
2% see content filtering.


Marc Perkel

unread,
Dec 15, 2015, 12:06:43 PM12/15/15
to Vsevolod Stakhov, rspamd

On 12/15/15 07:06, Vsevolod Stakhov wrote:
> I'm not a big fan of querying unencrypted requests in the network,

I just use SSH tunnels when I go over the net.


Vsevolod Stakhov

unread,
Dec 15, 2015, 1:49:51 PM12/15/15
to Marc Perkel, rspamd
On 15/12/2015 16:49, Marc Perkel wrote:
>
> On 12/15/15 07:06, Vsevolod Stakhov wrote:
>> On 15/12/2015 01:15, Marc Perkel wrote:
>>>
>>> Actually it's a dedicated virtual server which is easy running OpenVZ.
>>> But - with spamassassin - it allows me to have a common bayes store. The
>>> 4 SA servers are separate from my many EXIM servers that use them on a
>>> load balanced setup.
>> The question is what are you trying to save by this? Disk space, right?
>> Anyway, I'm not a big fan of querying unencrypted requests in the
>> network, so I won't like redis backend solution. However, I'm thinking
>> about central storage like I did for fuzzy. That might be doable I
>> suppose...
>
> Disk space isn't the issue. What I want is a common store of bayes data
> for all servers so that a spam learned on one server affects all servers.
>
> The reason I suggested redis is because SpamAssassin used to use MySQL
> and it didn't work in high volume. But when they went to redis it all
> worked perfectly.

But again, what's the difference with learning of all spam servers all
together?

>>
>>> I filter about 5000 domains.
>> The only question is the amount of messages scanned per second in the
>> peak times.
>>
>
> Actually most of my spam filtering is done with Exim rules. Only maybe
> 2% see content filtering.

I'm pretty sure that content filtering in rspamd is much faster and
featurefull than in Exim.

--
Vsevolod Stakhov

Marc Perkel

unread,
Dec 15, 2015, 2:10:00 PM12/15/15
to Vsevolod Stakhov, rspamd

On 12/15/15 10:49, Vsevolod Stakhov wrote:
> I'm pretty sure that content filtering in rspamd is much faster and
> featurefull than in Exim.

Wouldn't bet on that. Agreed on featurefull.

There are things that Exim can do that you have to do at the MTA level.
I can watch to see if the connection is closed by quit. I can measure
the speed the the message is delivered. I can do forward and reverse
callouts to verify the sender and recipient.

Exim give me the ability to test the message while it's being delivered.
And it's amazingly powerful.

However - I am looking to possibly replace some SpamAssassin usage with
your rspamd. I just downloaded the new version so now I might be able to
do some serious testing.

marc perkel

unread,
Dec 16, 2015, 9:13:53 PM12/16/15
to rspamd
Bayes still is not learning. I deleted the files in /var/lib/rspamd and restarted the server. It created the initial files but those sql files are not growing.

Still trying to get this to work. Thanks for your help.
 
d /var/lib/rspamd/
total 148
drwxr-xr-x  2 _rspamd _rspamd  4096 Dec 16 18:08 .
drwxr-xr-x 16 root    root     4096 Dec 16 04:02 ..
-rw-r--r--  1 _rspamd _rspamd  9216 Dec 16 14:27 bayes.ham.sqlite
-rw-r--r--  1 _rspamd _rspamd  9216 Dec 16 14:27 bayes.spam.sqlite
-rw-------  1 root    root    56988 Dec 16 18:08 rspamd.history
srw-------  1 root    root        0 Dec 16 18:08 rspamd.sock
-rw-r--r--  1 _rspamd _rspamd   272 Dec 16 18:08 stats.ucl
-rw-r--r--  1 root    root    55912 Dec 16 18:08 symbols.cache

marc perkel

unread,
Dec 17, 2015, 7:32:34 PM12/17/15
to rspamd
Now I'm getting this error in the logs:

2015-12-17 16:31:19 #6761(normal) rspamd_stat_preprocess: backend of type sqlite3 does not exist: BAYES_SPAM
2015-12-17 16:31:19 #6761(normal) rspamd_stat_preprocess: backend of type sqlite3 does not exist: BAYES_HAM
 
What does this mean?

Reply all
Reply to author
Forward
0 new messages