Feedback on using Rspamd

579 views
Skip to first unread message

Andris Reinman

unread,
Nov 1, 2016, 3:52:25 AM11/1/16
to rspamd
Hey all,

We at www.zone.ee (local hosting company) have been using Rspamd in a single server to filter some of the outbound email traffic. MTA server that uses Rspamd is home-brewed ZoneMTA (https://github.com/zone-eu/zone-mta). The current volume processed by Rspamd is about 200k emails a day and so far it has performed really well, CPU usage has been nearly non-existant. Based on the current experience we plan to replace SpamAssassin with Rspamd for inbound emails as well.

We have also ran into some problems with Rspamd:

1. Redis usage. Using Redis did not seem to work for Bayes classifier in stable 1.3 (Rspamd did not give any bayes ham or spam points to messages), so we upgraded to the development 1.4. In the end we had to turn it off though, Rspamd was opening so many connections to Redis over time (thousands of connections), that it exhausted Redis connection limits.

2. Large RAM use. Normally Rspamd process RAM usage was around 200-300MB but switching on PhishTank support increased RAM usage up to 700-900MB per "normal worker" process (main process still takes around 300MB). The server has a lot of cores and thus Rspamd created a lot of worker processes. We really wanted PhishTank support, so we reduced the number of Rspamd workers which kind of fixed the problem - the server has enough RAM that having 5 worker processes taking 4-5BG RAM is not a problem. It seems weird though.

3. Understanding default configuration. It took a while until we realised that Rspamd has its own rate limiting. At first we thought that 504 errors for spam checks were happening because Rspamd itself was having issues. We do not need rate limiting for outbound so once when we figured out what was actually going on we were able to disable it.

4. Quirks with HTTP protocol. We use chunked uploads for sending messages to Rspamd and it seems that you can't use chunks larger than 12kB, otherwise some kind of data loss happened which in turn started returning strange issues for messages like all DKIM keys failed validation, HTML parts were different for Text parts etc.

In general we are really happy with it and plan to extend using it.

Best regards,
Andris Reinman

Andrew Lewis

unread,
Nov 1, 2016, 4:57:32 AM11/1/16
to rsp...@googlegroups.com
Hi Andris,

Thanks for getting in touch.

> 1. Redis usage. Using Redis did not seem to work for Bayes classifier in
> stable 1.3 (Rspamd did not give any bayes ham or spam points to messages),
> so we upgraded to the development 1.4. In the end we had to turn it off
> though, Rspamd was opening so many connections to Redis over time
> (thousands of connections), that it exhausted Redis connection limits.

It's expected to work on 1.3.x branch. Rspamd 1.4 has Redis connection
pooling which would cause an increase in connections (however these
should not just grow exponentially). Perhaps this is somehow buggy.

> 2. Large RAM use. Normally Rspamd process RAM usage was around 200-300MB
> but switching on PhishTank support increased RAM usage up to 700-900MB per
> "normal worker" process (main process still takes around 300MB). The server
> has a lot of cores and thus Rspamd created a lot of worker processes. We
> really wanted PhishTank support, so we reduced the number of Rspamd workers
> which kind of fixed the problem - the server has enough RAM that having 5
> worker processes taking 4-5BG RAM is not a problem. It seems weird though.

Low memory usage is not a strong-point of Rspamd & loading huge maps
(like Phishtank) is a problem indeed.

> 3. Understanding default configuration. It took a while until we realised
> that Rspamd has its own rate limiting. At first we thought that 504 errors
> for spam checks were happening because Rspamd itself was having issues. We
> do not need rate limiting for outbound so once when we figured out what was
> actually going on we were able to disable it.

Unfortunate side-effect of global redis settings is that this enables
the (otherwise sleeping) ratelimit module. This yields result with
'soft reject' action, not a 504 error. Nothing in rspamd yields 504
AFAIK, I suppose this came from some proxy in front of rspamd.

> 4. Quirks with HTTP protocol. We use chunked uploads for sending messages
> to Rspamd and it seems that you can't use chunks larger than 12kB,
> otherwise some kind of data loss happened which in turn started returning
> strange issues for messages like all DKIM keys failed validation, HTML
> parts were different for Text parts etc.

Sounds like a possible bug - would suggest opening a ticket at the
issue tracker with recipe to reproduce.

You may want to consider joining us in #rspamd on Freenode. :)

Best,
-AL.

Vsevolod Stakhov

unread,
Nov 1, 2016, 5:03:54 AM11/1/16
to Andris Reinman, rspamd
On 01/11/2016 07:52, Andris Reinman wrote:
> Hey all,
>
> We at www.zone.ee (local hosting company) have been using Rspamd in a
> single server to filter some of the outbound email traffic. MTA server
> that uses Rspamd is home-brewed ZoneMTA
> (https://github.com/zone-eu/zone-mta). The current volume processed by
> Rspamd is about 200k emails a day and so far it has performed really
> well, CPU usage has been nearly non-existant. Based on the current
> experience we plan to replace SpamAssassin with Rspamd for inbound
> emails as well.

Thank you for your valuable feedback, please see my reply below.

> We have also ran into some problems with Rspamd:
>
> 1. Redis usage. Using Redis did not seem to work for Bayes classifier in
> stable 1.3 (Rspamd did not give any bayes ham or spam points to
> messages), so we upgraded to the development 1.4. In the end we had to
> turn it off though, Rspamd was opening so many connections to Redis over
> time (thousands of connections), that it exhausted Redis connection limits.

From 1.4, Rspamd uses Redis pool for lua connections but not for Bayes.
Your issue looks strange because I have never observed issues with
bayes. As a temporary workaround you could add connection time limit to
Redis itself:

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 3

> 2. Large RAM use. Normally Rspamd process RAM usage was around 200-300MB
> but switching on PhishTank support increased RAM usage up to 700-900MB
> per "normal worker" process (main process still takes around 300MB). The
> server has a lot of cores and thus Rspamd created a lot of worker
> processes. We really wanted PhishTank support, so we reduced the number
> of Rspamd workers which kind of fixed the problem - the server has
> enough RAM that having 5 worker processes taking 4-5BG RAM is not a
> problem. It seems weird though.

Do you mean RSS or VSZ for a process? VSZ doesn't mean anything on 64
bit system. Here is what I see on a production system:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17895 _rspamd 20 0 170704 24316 7008 R 28,6 0,0 0:06.70 rspamd

So actually a process eats like 24Mb memory + 7Mb of shared memory which
looks quite reasonable. Over time, it usually increases to about
50-100Mb due to memory fragmentation.

Phishtank is different indeed. I've recently added some fixes to reduce
Rspamd memory (and network bandwidth) footprint when using it but it is
still quite high.

> 3. Understanding default configuration. It took a while until we
> realised that Rspamd has its own rate limiting. At first we thought that
> 504 errors for spam checks were happening because Rspamd itself was
> having issues. We do not need rate limiting for outbound so once when we
> figured out what was actually going on we were able to disable it.

Can you suggest something to improve this situation?

> 4. Quirks with HTTP protocol. We use chunked uploads for sending
> messages to Rspamd and it seems that you can't use chunks larger than
> 12kB, otherwise some kind of data loss happened which in turn started
> returning strange issues for messages like all DKIM keys failed
> validation, HTML parts were different for Text parts etc.

That's interesting. I've never ever used chunked encoding so I cannot
add more so far. I'll try to write a test, thanks for report.

> In general we are really happy with it and plan to extend using it.

Do not hesitate to ask me about your concerns or bugs you've found. You
can use this list or IRC for these purposes. I can also add your badge
to 'rspamd.com' site if you'd like.

--
Vsevolod Stakhov

Andris Reinman

unread,
Nov 1, 2016, 8:42:50 AM11/1/16
to Vsevolod Stakhov, rspamd
HI,

Do you mean RSS or VSZ for a process? VSZ doesn't mean anything on 64
bit system. Here is what I see on a production system:

It was RSS. Here’s some screenshots of process stats with PhishTank enabled and disabled: https://cloudup.com/c8mqvnRlJLN the only difference between these two is phishtank_enabled=true/false option, nothing else. Server is Ubuntu 16.04, 24 cores, 32GB RAM. Rspamd is the latest from "rspamd.com/apt/ xenial main”. Openphish data is loaded through Nginx proxy as suggested by the docs. The server only runs Nginx, Rspamd, Redis and ZoneMTA.

 Understanding default configuration. It took a while until we
realised that Rspamd has its own rate limiting.

Can you suggest something to improve this situation?

Not sure actually. It would have helped if the error message “Ratelimit exceeded” would say something about what kind of rate limit it means. At first we thought that we have hit some kind of 3rd party RBL rate limiting (eg. Spamhaus) as we were making so many checks. It would have also helped if we had actually read all the docs.

That's interesting. I've never ever used chunked encoding so I cannot
add more so far. I'll try to write a test, thanks for report.

Chunked encoding makes sense as we pass the message to Rspamd as it comes in. This is way more efficient than buffering the entire message before passing it on.

In general the flow in ZoneMTA is the following. Data is passed from one step to next in 64kB chunks, except the data sent to Rspamd which is now forced to use 8kB chunks:

SMTP Client -> ZoneMTA smtp server -> multiplexer -> one stream to Rspamd, other to internal processing -> internal message parsing, header and body rewriting etc -> DKIM body hash calculation -> LevelDB. By the time Rspamd returns the message is most probably already processed and stored to LevelDB and it waits to be confirmed (or partially stored if the processing still takes time). If the message needs to be rejected or some other error happens during these steps, the already stored data from LevelDB is deleted and an error response is returned to the still waiting SMTP client, otherwise 250 OK is sent. This approach allows us to support very large messages (100MB+), as we never buffer the entire message to memory and always operate on small chunks. I do not know how Rspamd handles large messages but I guess it does something similar.

Regards,
Andris Reinman

--
You received this message because you are subscribed to a topic in the Google Groups "rspamd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rspamd/2Bn7g3wH0-I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rspamd+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/rspamd.

Vsevolod Stakhov

unread,
Nov 1, 2016, 10:24:17 AM11/1/16
to Andris Reinman, rspamd
On 01/11/2016 12:42, Andris Reinman wrote:
> HI,
>
>> Do you mean RSS or VSZ for a process? VSZ doesn't mean anything on 64
>> bit system. Here is what I see on a production system:
>
> It was RSS. Here’s some screenshots of process stats with PhishTank
> enabled and disabled: https://cloudup.com/c8mqvnRlJLN the only
> difference between these two is phishtank_enabled=true/false option,
> nothing else. Server is Ubuntu 16.04, 24 cores, 32GB RAM. Rspamd is the
> latest from "rspamd.com/apt/ <http://rspamd.com/apt/> xenial main”.
> Openphish data is loaded through Nginx proxy as suggested by the docs.
> The server only runs Nginx, Rspamd, Redis and ZoneMTA.

I see 100MB shared. Do you use mmapped files for stats? If yes, then
it's bad: this backend is terribly outdated and I have not checked it
since 0.6. That could explain high values of RSS in your case. Phishtank
case is different. You could try fresh Rspamd with compressed maps
support and rspamd.com mirror for phishtank. Hopefully, that would help
to reduce memory footprint in that case.

>>> Understanding default configuration. It took a while until we
>>> realised that Rspamd has its own rate limiting.
>>
>> Can you suggest something to improve this situation?
>
> Not sure actually. It would have helped if the error message “Ratelimit
> exceeded” would say something about what kind of rate limit it means. At
> first we thought that we have hit some kind of 3rd party RBL rate
> limiting (eg. Spamhaus) as we were making so many checks. It would have
> also helped if we had actually read all the docs.

What I do see for ratelimits is:

task:set_pre_result('soft reject', string.format('Ratelimit "%s"
exceeded', rtype))

And in logs it should write even exact values for limits.

>> That's interesting. I've never ever used chunked encoding so I cannot
>> add more so far. I'll try to write a test, thanks for report.
>
> Chunked encoding makes sense as we pass the message to Rspamd as it
> comes in. This is way more efficient than buffering the entire message
> before passing it on.
>
> In general the flow in ZoneMTA is the following. Data is passed from one
> step to next in 64kB chunks, except the data sent to Rspamd which is now
> forced to use 8kB chunks:
>
> SMTP Client -> ZoneMTA smtp server -> multiplexer -> one stream to
> Rspamd, other to internal processing -> internal message parsing, header
> and body rewriting etc -> DKIM body hash calculation -> LevelDB. By the
> time Rspamd returns the message is most probably already processed and
> stored to LevelDB and it waits to be confirmed (or partially stored if
> the processing still takes time). If the message needs to be rejected or
> some other error happens during these steps, the already stored data
> from LevelDB is deleted and an error response is returned to the still
> waiting SMTP client, otherwise 250 OK is sent. This approach allows us
> to support very large messages (100MB+), as we never buffer the entire
> message to memory and always operate on small chunks. I do not know how
> Rspamd handles large messages but I guess it does something similar.


No, Rspamd processes messages completely in the memory so you shouldn't
basically send 100MB+ messages to Rspamd as they are not spam by
definition. I had this idea to make it possible to work with chunks, but
it is terribly hard to implement as it requires to have own streaming
MIME parser (gmime does not support chunks or streaming). So I won't
likely do it in Rspamd.

BTW, you can also use DKIM signing from Rspamd if you'd like.

> Regards,
> Andris Reinman
>
>> On 1. nov 2016, at 11:02, Vsevolod Stakhov <vsev...@highsecure.ru
>> <mailto:vsev...@highsecure.ru>> wrote:
>>
>> On 01/11/2016 07:52, Andris Reinman wrote:
>>> Hey all,
>>>
>>> We at www.zone.ee <http://www.zone.ee> (local hosting company) have
>> to 'rspamd.com <http://rspamd.com>' site if you'd like.
>>
>> --
>> Vsevolod Stakhov
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "rspamd" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/rspamd/2Bn7g3wH0-I/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> rspamd+un...@googlegroups.com
>> <mailto:rspamd+un...@googlegroups.com>.
>> Visit this group at https://groups.google.com/group/rspamd.
>
> --
> You received this message because you are subscribed to the Google
> Groups "rspamd" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rspamd+un...@googlegroups.com
> <mailto:rspamd+un...@googlegroups.com>.
--
Vsevolod Stakhov

Vsevolod Stakhov

unread,
Nov 1, 2016, 1:14:29 PM11/1/16
to Andris Reinman, rspamd
On 01/11/2016 12:42, Andris Reinman wrote:
> HI,
>
>> That's interesting. I've never ever used chunked encoding so I cannot
>> add more so far. I'll try to write a test, thanks for report.
>
> Chunked encoding makes sense as we pass the message to Rspamd as it
> comes in. This is way more efficient than buffering the entire message
> before passing it on.
>
> In general the flow in ZoneMTA is the following. Data is passed from one
> step to next in 64kB chunks, except the data sent to Rspamd which is now
> forced to use 8kB chunks:

The issue with chunked encoding has been fixed and the new experimental
packages are ready for use. Thank you for your report.


--
Vsevolod Stakhov

Andris Reinman

unread,
Nov 8, 2016, 4:07:36 AM11/8/16
to Vsevolod Stakhov, rspamd
Hi,

I tried it out and chunked encoding seems to be fixed now, thanks!

I created a new Rspamd instance (using v1.4) and the problems I had with the previous instance are gone: PhishTank support does not increase memory usage (worker processes use ~90MB), Redis seems to work, there’s about 30 clients for 10 rspamd workers, not 10000 (I did set 120sec timeout in Redis server config but I'm not sure if that helped or not) etc.

rspamc stat shows "Total learns: 0” and I’m not sure what this means as the autolearning seems to be functioning, the “learned” counters for BAYES_HAM and BAYES_SPAM statfiles are incrementing.

My guess is that upgrading Rspamd over several versions break something because I used the same config for the new instance as I did for the old one. Might be somehow related to servers as well as the new instance runs in a different server (both are Ubuntu-16.04 but use different hardware). Anyhow, everything seems to be working great now. On peak times the system processes ~45 messages/second and this is how the Rspamd machine looks like: https://cloudup.com/cihpykV78bc

Andris
Reply all
Reply to author
Forward
0 new messages