Over the past couple of days I have been looking into rebuilding our
current mail
filtering platform. At the moment it is based upon postfix,
spamassassin, clamav and
some little bits and pieces that make it all work together - some of
which are custom
hacks all running on some rather dated Sun Solaris machines.
The word custom hack gives me the shivers - even more so since the
people that
wrote these neglected to document them and have since then moved onto
other companies.
So I am faced with the challenge of rebuilding something of which we are
not sure how it
all works - and make sure it does that same thing - only faster (don't
you love those assignments).
Anyways - to make a long story short I would like to stick with proven
technology that works together in well known and documented ways. After
my research I have come up with a setup that I believe will work. But
considering that my experience with postfix is somewhat limited I'd love
to get some feedback from the people that have more experience with this
all. Considering this is a fairly large setup I do not really have the
hardware laying about to build it and test it and I hate to order
something that wont work in the end.
The key issues I have to tackle are redundancy, resilience, speedy
delivery (and hence speedy spam/virus checks), easy management, per
domain UCE settings. Oh.. and it needs to handle about 50.000 domains.
On an average day the current platform receives about 400.000 emails -
of which 60% is rejected due to
spam and viruses.
The setup I came up with is as follow:
2x incoming mail server (mx records will point to this)
4x backend mail server (spam/virus/etc)
2x pop3/imap server with shared storage
2x webmail server
1x log server/master mysql server
The incoming servers would run postfix and handle the not resource
intensive filtering like sender/recipient checks, helo checks, rfc
conformity, etc. Once the email passes these tests it will be passed on
to one of
the four backend servers which from the looks of it will most likely run
amavisd-new with spamassassin and clamav (possibly some other virus
scanner as well). I want all four servers to be able to handle all emails,
but depending on the recipient domain act differently (as in - do no
spam/virus check, do one or the other,
do both - and also some spamassassin settings per domain). Once the
email clears the amavisd-new section I figured I'd have it delivered to
a local (as in on the same backend server) copy of postfix - who can
then handle the delivery to either the remote smtp server in case of an
outgoing email - or the local mail store over either smtp or lmtp. The
local mailstore I'll probably build with cyrus which the customers can
then access with pop3 or imap clients - or when on the road or by choice
they can reach the mailstore via
the webmail servers - probably squirrelmail + imap proxy. All of the
mailserver I will have logging to the central logserver - which also
houses the master mysql database. I say master since I figured I can run
a local slave mysql installation on each of the mailservers needed it
(eg incoming servers to verify recipient, backend servers to do per
domain settings, bayes filtering, etc). That way I would not hammer one
mysql server with requests of all mail servers - but still keep a
central point of administration. The reason I picked mysql is that it is
a well known product in our company - contrary to for instance ldap with
which not a whole lot of people have worked before.
Now my first question of course is - does a platform like this make
sense? In my mind it covers all of the
requirements - plus it is easy to expand on when we need more capacity
(add extra incoming/backend servers where needed).
My second question is - how would I properly load balance the processed
mails from the incoming servers over the backend servers ? I'd like to
get an equal spread over all backend servers - so round robin would work.
The other question I have - if I use this platform for outgoing mail
filtering and scanning - do I run the risk of the remote site refusing
my mail since the MX records for the sender domain do not point to the
server actually trying to deliver the mail (mx for example.com point to
the incoming mail servers - but j...@example.com trying to email
ja...@example.net would have his mail leaving this platform from one of
the backend servers. Do I have to run smtp daemons here that are
reachable for verification, etc?
Then there is the question - have I covered potential bottlenecks here
(as in mysql queries, etc)?
The last question I have is one that will be hard to answer in absolute
numbers - but a ballpark would help a lot already - what kind of
hardware scaling would be needed here? We have two mail clusters at the
moment - which under peak usage handle about 12.000 emails per hour -
I'd like to have all this scanned and checked as soon as possible. I was
thinking about Xeon or Opteron frontends, scsi disks, 1 Gb of memory
(should be enough I think for postfix + database slave), and then dual
Xeon/Opteron with at least 2 Gb of memory for each of the backends. From
what I saw spam/virus checking is a real memory hog and also cpu
intensive. If at all possible I'd like to run amavisd-new with a ramdisk
filesystem to do it's checks on - so I'll need some memory for that as well.
I understand that this is a lot of information but like i said in the
beginning I hope some of you can find the time to go over it and give
comments - because I seriously doubt my boss would be happy with me once
I order this and it turns out the be the wrong setup ;-)
Also - I apologize if this is not the correct place to ask these
questions - in that case please let me know where to ask about this all.
Thank you for your time,
- Jasper
...
This would be the tricky part of the setup. Cyrus with shared storage
is not supported as far as i know. You should have a look at Cyrus IMAP
Aggregator (aka. MURDER) or perdition. For the virus/spam scanner you
could use transport tables to route traffic but this would defeat
redundancy at this point. With DNS round robin etc. you get non equal
load because of connection caching/reusing.
> The other question I have - if I use this platform for outgoing mail
> filtering and scanning - do I run the risk of the remote site
> refusing my mail since the MX records for the sender domain do not
> point to the server actually trying to deliver the mail (mx for
> example.com point to the incoming mail servers - but j...@example.com
> trying to email ja...@example.net would have his mail leaving this
> platform from one of the backend servers. Do I have to run smtp
> daemons here that are reachable for verification, etc?
It is common to split incoming and outgoing mailflow. Verification
systems which probe the sending server despite the presence of MX
records pointing to other servers are brain damaged to say at least.
The only thing you should keep in mind is SPF but after the hype it is
not used that much.
> Then there is the question - have I covered potential bottlenecks
> here (as in mysql queries, etc)?
- Reject invalid recipients at the frontend
- Be sure to have a useful (simple) design for you lookup maps - Read
and understand http://www.postfix.org/VIRTUAL_README.html
- Have a look at postfix proxymap to lower the number of connections to
your DB
- Use a caching nearby Nameserver at low latency link
> The last question I have is one that will be hard to answer in
> absolute numbers - but a ballpark would help a lot already - what
> kind of hardware scaling would be needed here? We have two mail
> clusters at the moment - which under peak usage handle about 12.000
> emails per hour - I'd like to have all this scanned and checked as
> soon as possible. I was thinking about Xeon or Opteron frontends,
> scsi disks, 1 Gb of memory (should be enough I think for postfix +
> database slave), and then dual Xeon/Opteron with at least 2 Gb of
> memory for each of the backends. From what I saw spam/virus checking
> is a real memory hog and also cpu intensive. If at all possible I'd
> like to run amavisd-new with a ramdisk filesystem to do it's checks
> on - so I'll need some memory for that as well.
12.000 emails per hour is 3-4 emails per second. Without heavy
filtering a simple Pentium with a single old SCSI disk can stand this
load at the frontend.
Be sure to have a blastening fast I/O subsystem at the backends as they
are hammered with POP3/IMAP and incoming mail. If you are using content
filter (SA) be sure to have a lot of RAM on the scanning machine and
search the amavisd list for tips and the recent discussion about
ramdisk filesystems.
These are all tips in general and the whole story depends a lot on your
special needs. If you have questions on the special setup you can
always ask the list with a lot of people doing such systems as normal
business.
Hope this helps for a start
Regards
Andreas
In our case we have many more front end servers for other reasons. But
here is a generate breakdown:
We setup the primary MX servers on the front end which are just relays.
Mail comes in to them and is then forwarded to a commercial A/V
application. From their the A/V introduced the email back to postfix's
filters. The next filter is spamassassin (spamc client) which makes
it's call to the band end servers. We have a set of 4 spamassassin
boxes behind our firewall that just processes these emails. They are
load balances and semi-redundant. In all cases, the back end processing
servers don't actually receive any email but rather house the services
that will process the email.
From their mail gets delivered back to the storage boxes. These boxes
are a linux-ha cluster running DRBD. In our case they do a little more
but they need only hold the email account in your case. We have cyrus
as one of the linux-ha managed services
Then there is the web mail. We have horde setup to manage this. This
can be on any set of boxes you need.
I think you have the right idea for the allocation of servers. We do
about 50% more email than you are looking to process. The cool thing
about setting is up to use multiple servers now is that you can easily
scale this out in the future when 400k messages become 1m messages
I'd highly recommend a redundant mysql server. We have a clustered
MySQL that is also replicated to another cluster
> lst_...@kwsoft.de [lst_...@kwsoft.de] wrote:
>
>> - Reject invalid recipients at the frontend
>> - Be sure to have a useful (simple) design for you lookup maps - Read
>> and understand http://www.postfix.org/VIRTUAL_README.html
>> - Have a look at postfix proxymap to lower the number of connections to
>> your DB
>> - Use a caching nearby Nameserver at low latency link
>
> Great pointers - I will look into that now. Will the proxy map still be
> usefull given that I was thinking about running a dedicated mysql slave
> per mta? I was also thinking about possibly running dedicated dns cache
> servers on the same lan - or installing a copy of dnscache directly on the
> mta to keep latencies down to a minimum.
It should not be necessary to run a mysql slave per MX server. A DNS
cache on the server is useful for sure. Try to reject as many spam as
possible at this stage to prevent your content filters from chewing too
much. With this you can save you a lot of processing power.
Per domain/user settings regarding the spam filter is possible in
postfix with restriction classes.
>> 12.000 emails per hour is 3-4 emails per second. Without heavy
>> filtering a simple Pentium with a single old SCSI disk can stand this
>> load at the frontend.
>
> I'd like to keep the frontends as simple as possible filter wise - just so
> that they will be able to respond to incoming requests quickly.
> Basically just
> looking to enforce rfc compliance, verify sender/recipient/etc.
>
> On the backend servers I am planning to handle the spam/virus
> checking which on
> success forward to a local postfix setup to keep the frontend clear.
> The backend
> will have to be configurable on a per domain basis and for certain
> domains even
> on a per recipient basis.
This is mostly amavisd (or some other content filter hook) configuration ...
As far as i know the feature you are looking for are policy banks in
this case.
>> Be sure to have a blastening fast I/O subsystem at the backends as they
>> are hammered with POP3/IMAP and incoming mail. If you are using content
>> filter (SA) be sure to have a lot of RAM on the scanning machine and
>> search the amavisd list for tips and the recent discussion about
>> ramdisk filesystems.
>
> Ah.. I might not have made this clear - the backend will not ever be
> reachable
> by clients - it will either forward to remote smtp servers for outgoing email
> or to the cyrus server(s) for local delivery. The cyrus server will
> be reachable
> for clients either directly for pop3/imap or via a web frontend.
> So I forsee needed something high in I/O for the scanning - and a
> store for all
> mail that is waiting to be delivered. Was thinking of using tempfs
> for the I/O
> intensive parts.
For backends i meant the IMAP/Cyrus servers, not the content filters.
As of tempfs/ramdisk for amavisd have a look at the recently posted
benchmarks at the list.
> The Cyrus setup I have in my mind at the moment is an active/passive
> cluster with
> external storage (looking into iscsi or fc) to provide fast I/O and
> redundancy.
For scaling have a look at Cyrus Murder
(http://asg.web.cmu.edu/cyrus/ag.html) and perdition
(http://www.vergenet.net/linux/perdition/).
Regards
Andreas
It seems SPF is more used by spammers than by legitimate sites. but I
don't have any numbers here (just some articles I've seen on the web a
long ago). I have disabled SPF checks in SA since long, and I haven't
seen any degradation in my filtering.
I don't use greylisting because it breaks conversations, why would I go
spf and break forwarding?
how does greylisting break conversations?
it delays the *first* message from a sender to a recipient from an IP
address. that's all. any subsequent messages from the same sender to the
same recipient from the same IP address will be accepted without any
delay.
and the delay doesn't have to be long - even a greylisting timeout of 5
seconds is enough to block most spamware which doesn't retry at all.
craig
--
craig sanders <c...@taz.net.au> (part time cyborg)
my postfix scripts are at http://taz.net.au/postfix/scripts/
my recommendation would be to simplify the design by merging the 2 MX
servers and 4 backend (amavis/clamav/etc) boxes into 6 MX servers which
all run amavis/clamav/etc.
load balancing could be achieved with a real load balancer (e.g. LVS
http://www.linuxvirtualserver.org/) or by round-robin DNS.
this is easy to scale up, just add more duplicates of the MX boxes.
any of the boxes could double as outbound smtp relays for your customers
to use, or you could add another machine or two to do that job.
if you don't mind your customers' outbound mail getting the same
antispam/antivirus treatment as the incoming mail, i'd just use the MX
boxes as outbound relays - perhaps adding another machine or two if load
requires it.
> The other question I have - if I use this platform for outgoing mail
> filtering and scanning - do I run the risk of the remote site refusing
> my mail since the MX records for the sender domain do not point to the
> server actually trying to deliver the mail (mx for example.com point
> to the incoming mail servers - but j...@example.com trying to email
> ja...@example.net would have his mail leaving this platform from one
> of the backend servers. Do I have to run smtp daemons here that are
> reachable for verification, etc?
no. nobody with any sense requires that mail comes from the MX host for
a sender domain. MX records specify the hostname that will *receive*
mail for a given domain, it says *nothing* about which hosts are allowed
to send mail from that domain....nor was it ever intended to do so.
i.e. MX records are not SPF records.
> The last question I have is one that will be hard to answer in absolute
> numbers - but a ballpark would help a lot already - what kind of
> hardware scaling would be needed here? We have two mail clusters at the
> moment - which under peak usage handle about 12.000 emails per hour -
> I'd like to have all this scanned and checked as soon as possible. I was
> thinking about Xeon or Opteron frontends, scsi disks, 1 Gb of memory
> (should be enough I think for postfix + database slave), and then dual
> Xeon/Opteron with at least 2 Gb of memory for each of the backends. From
some rules of thumb:
1. you need fast CPUs and lots of RAM on the machines running
amavis/clamav/spamassassin/etc.
Note that it's much more cost effective to have more machines rather
than bigger CPUs - e.g. for the price of, say, 3 latest model high-end
CPU machines you can probably get 10 machines that are one or three
generations behind the latest. this also has redundancy benefits AND
spreads the I/O load over more machines (this is very significant - see
point 2. below).
for example, 10 x AMD-64 3000 machines will be able to process a lot
more mail than 3 x opteron 4800 machines....and in a year or two, you
can upgrade all the CPUs with X2 dual cores for probably around $100
each.
2. you need fast I/O on all machines. except for spamassassin etc, mail
is an I/O bound application. the faster, the better. you can never have
enough I/O bandwidth.
3. you might want to consider solid-state disks for the queue
directories - dunno how they perform in real life(*) but Gigabyte
recently released a PCI card SSD that takes up to 4GB of DDR RAM (with
about 16 hours worth of battery backup). costs about $AUD400 plus the
RAM, which is dirt cheap compared to other SSDs on the market (last time
i checked, they were averaging well over $AUD1000/gigabyte) - it means
you can get a 4GB SSD for about $AUD800.
4GB ought to be enough for the queue, but if you need more, you can always
use multiple SSDs and RAID0 to append them (no need for striping because
there are no heads to move and thus no seek time delays).
(*) i'm planning to trial them next time i need to build a big mail
server. by that time, they should be even cheaper and bigger...
> what I saw spam/virus checking is a real memory hog and also cpu
> intensive. If at all possible I'd like to run amavisd-new with a
> ramdisk filesystem to do it's checks on - so I'll need some memory for
> that as well.
the trouble with ramdisks is that they lose their contents in case of
power failure....SSDs solve that problem - ramdisk speeds, without risk
of loss. but if you're willing to accept that risk, it's a lot cheaper
to just add a few more gigabytes of RAM and run a ramdisk.
That's just that. If I get the reply before the message, then I call it
breaking the converstation.
I understand that smtp isn't an instant messaging protocol, but when N
people have exchanged M messages, but you only got few of them, then you
check why, and if it's because of GL, then you say it's because of GL.
That said, using GL in some cases is ok.
> and the delay doesn't have to be long - even a greylisting timeout of 5
> seconds is enough to block most spamware which doesn't retry at all.
>
how do you tell the client to retry in 5 seconds? They retry when they
want (some don't but that's another issue).
that's only likely to happen on a mailing list, and it can happen
anytime, anyway (especially when list mail is also CC-ed to participants
in a thread) - depending on the workload of each involved mail server.
in other words, you're imagining a problem that a) isn't anywhere near
as big as you're making out, and b) isn't at all unique to greylisting.
> >and the delay doesn't have to be long - even a greylisting timeout of 5
> >seconds is enough to block most spamware which doesn't retry at all.
>
> how do you tell the client to retry in 5 seconds? They retry when they
> want (some don't but that's another issue).
you don't. any more than you tell a client when to retry when it gets
a "450 disk full" or some other non-GL 4xx tempfail code. it retries
according to its own schedule and workload.
many sites, however, seem to implement an incremental backoff strategy
- first retrying almost immediately and gradually increasing the time
between retry attempts.
>>> and the delay doesn't have to be long - even a greylisting timeout of 5
>>> seconds is enough to block most spamware which doesn't retry at all.
>>>
>> how do you tell the client to retry in 5 seconds? They retry when they
>> want (some don't but that's another issue).
>>
>
> you don't. any more than you tell a client when to retry when it gets
> a "450 disk full" or some other non-GL 4xx tempfail code. it retries
> according to its own schedule and workload.
>
> many sites, however, seem to implement an incremental backoff strategy
> - first retrying almost immediately and gradually increasing the time
> between retry attempts.
>
When the problems I've seen occured, the sending systems didn't seem to
do it that way. of course, these weren't postfix/sendmail:exim/qmail...
Once again, I am not saying: don't use GL. It's just that the benefits
of GL _here_ aren't enough.
On the other hand, use of GL in some cases is helpful: for instance,
when the rdns looks dynamic.