I have a relay server that passes mail onto a cluster of filter
servers and I'm seeing lots of deferred, connection reset by [filter
server] on my relay server.
The filter servers (running mailscanner) are not very busy and I have
been playing with timeouts for ages now trying to get the numbers of
deferred connections down. I just wondered if anyone had any ideas?
Here are the timeouts for the relay server that initially receives the
mail.
define(`confTO_COMMAND', `5m')dnl
define(`confTO_IDENT',`0s')dnl
define(`confTO_ICONNECT', `20s')dnl
define(`confTO_CONNECT', `4m')dnl
define(`confTO_HELO', `2m')dnl
define(`confTO_MAIL', `4m')dnl
define(`confTO_RCPT', `4m')dnl
define(`confTO_DATAINIT', `3m')dnl
define(`confTO_DATABLOCK', `3m')dnl
define(`confTO_DATAFINAL', `10m')dnl
define(`confTO_RSET', `1m')dnl
define(`confTO_QUIT', `1m')dnl
define(`confTO_MISC', `1m')dnl
There doesn't seem to be any issues connecting to this server.
Here's the timeouts that I have on my filter servers (one of them) -
define(`confTO_COMMAND',`3m')dnl
define(`confTO_ICONNECT', `15s')dnl
define(`confTO_CONNECT', `4m')dnl
define(`confTO_HELO', `2m')dnl
define(`confTO_MAIL', `4m')dnl
define(`confTO_RCPT', `4m')dnl
define(`confTO_DATAINIT', `3m')dnl
define(`confTO_DATABLOCK', `10m')dnl
define(`confTO_DATAFINAL', `10m')dnl
define(`confTO_RSET', `1m')dnl
define(`confTO_QUIT', `1m')dnl
define(`confTO_MISC', `1m')dnl
Both servers seem to have around 150 - 250+ sendmail processes running
at any one time. Do you think I just need to add more servers or play
with the timeouts more?
Sometimes when you send an email directly to the filter cluster it
takes ages for it to go through, Plus I'm getting issues on our
monitoring software saying it can't connect on port 25, probably twice
a day on each server in that cluster.
Any comments on my timeouts?
Sorry, I thought I would mention I am using barracuda and spamcop for
spam dnsbl. I'm doing queuewarn and stuff too. It's just the timeouts
I need help with if you kind people can help.!
Oh, and Merry Christmas sendmail people.!
Have you considered limiting number or concurrent SMTP (TCP) connections
to scanners instead of playing with timeouts?
e.g.
queue all messages to scanners (e.g. using "expensive" mailer)
and
use more frequent queue runs (with MinQueueAge) or persistent queue runners
--
[pl>en Andrew] Andrzej Adam Filip : an...@onet.eu : Andrze...@gmail.com
Open-Sendmail: http://open-sendmail.sourceforge.net/
I know it's weird, but it does make it easier to write poetry in perl. :-)
-- Larry Wall in <78...@jpl-devvax.JPL.NASA.GOV>
[ http://groups.google.com/groups?selm=8ege7n6...@cynthia.brudna.chmurka.net ]
> define(`confTO_ICONNECT', `20s')dnl
>
> There doesn't seem to be any issues connecting to this server.
>
> Here's the timeouts that I have on my filter servers (one of them) -
>
> define(`confTO_COMMAND',`3m')dnl
> define(`confTO_ICONNECT', `15s')dnl
...
>
>
> Both servers seem to have around 150 - 250+ sendmail processes running
> at any one time. Do you think I just need to add more servers or play
> with the timeouts more?
> Sometimes when you send an email directly to the filter cluster it
> takes ages for it to go through, Plus I'm getting issues on our
> monitoring software saying it can't connect on port 25, probably twice
> a day on each server in that cluster.
>
> Any comments on my timeouts?
I suppose you were inspired by the old edition of bat book chapter on performance tuning.
Ther's a bug there. First of all, change the confT0_ICONNECT to it's default value, or simply remove
that line.
Really? It's undefined on the sendmail timeout pages, I was told to
put this quite low to weed out slow hosts.
Why do you say there is a bug and where is this documented?
...
>>> Any comments on my timeouts?
>> I suppose you were inspired by the old edition of bat book chapter on performance tuning.
>>
>> Ther's a bug there. First of all, change the confT0_ICONNECT to it's default value, or simply remove
>> that line.
>
> Really? It's undefined on the sendmail timeout pages, I was told to
> put this quite low to weed out slow hosts.
I understood. This is a hint which appeared in chapter 6 (performance tunning) of Bryan book.
ICONNECT is related to outgoing connections while the other timeouts are related to ingoing connections.
> Why do you say there is a bug and where is this documented?
It's an error in the book. It was probably corrected in the last edition. I found it and pointed it
out to Bryan (who acknowledged). If you don't believe me, you can just drop him a message or buy the
last edition of his book.
I totally believe you..
I've tried this and so far so good. I've removed it from all servers
as when removing from the cluster it made no difference. When removing
it from my relay server I now see very minimal connection resets - as
you say it's outbound so this makes sense.
I've also updated sendmail to 8.14.4 now on the clusters. I've had no
notifications of failure for port 25 mail yet but time will tell.
I might update sendmail on the relay server also. Currently I'm on
sasl2-8.14.3.
Thanks so much for your suggestion.