I am experiencing 421 errors between my secondary and primary MXes, and
it seems it is cause by lack of connection caching.
http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
misses to explain what is "high volume of mail in the active queue".
When is exactly connection caching activated?
Thanks
Peter
P.S. I know that I can permanently enable connection caching on the
secondary MX when talking to the primary but I am trying to keep the
configuration concise.
What is the error message?
> http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
> misses to explain what is "high volume of mail in the active queue".
> When is exactly connection caching activated?
Roughly, it is activated when the active queue contains another
message before the current delivery is completed.
Wietse
There is no error message as such, see below.
>> http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
>> misses to explain what is "high volume of mail in the active queue".
>> When is exactly connection caching activated?
>
> Roughly, it is activated when the active queue contains another
> message before the current delivery is completed.
>
If the primary MX is down for an extended period of time and a large
queue accumulates on the backup, all messages are rushed to the primary
MX in what it seems separate smtp connections. At least I was able to
count as many smtp processes in `ps` as 2/3 of the number of queued
messages, right after I issue `postfix flush`. If I specify explicit
caching for the particular mx host, things work as expected. I guess
there is not enough time for the caching on demand to activate when
doing a flush or having enough queued messages to simulate one.
Peter
Any reasonable SMTP server sends 421 followed by some text that
explains why it hangs up.
Having looked at the text below, I think your problem is that
you are making an insane number of SIMULTANEOUS connections
to the primary MX host.
> >> http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
> >> misses to explain what is "high volume of mail in the active queue".
> >> When is exactly connection caching activated?
> >
> > Roughly, it is activated when the active queue contains another
> > message before the current delivery is completed.
>
> If the primary MX is down for an extended period of time and a large
> queue accumulates on the backup, all messages are rushed to the primary
> MX in what it seems separate smtp connections. At least I was able to
> count as many smtp processes in `ps` as 2/3 of the number of queued
> messages, right after I issue `postfix flush`. If I specify explicit
> caching for the particular mx host, things work as expected. I guess
> there is not enough time for the caching on demand to activate when
> doing a flush or having enough queued messages to simulate one.
This is not a surprise.
If the number of SIMULTANEOUS connections is 2/3 the number of
queued messages, then most connections will never be reused because
the mail is already delivered.
I suggest that you revert to no more than 10-20 SIMULTANEOUS
connections to the primary MX (or to any machine).
/etc/postfix/main.cf:
smtp_destination_concurrency_limit=20
relay_destination_concurrency_limit=20
If you do that, not only will the primary MX perform better, you
will also see connection reuse happen automatically.
Wietse
>If the primary MX is down for an extended period of time and a large
>queue accumulates on the backup, all messages are rushed to the primary
>MX in what it seems separate smtp connections. At least I was able to
>count as many smtp processes in `ps` as 2/3 of the number of queued
>messages, right after I issue `postfix flush`. If I specify explicit
>caching for the particular mx host, things work as expected. I guess
>there is not enough time for the caching on demand to activate when
>doing a flush or having enough queued messages to simulate one.
>
>Peter
Sounds as if you have a large *_destination_concurrency_limit set on
the backup, and the primary isn't able to gracefully handle that many
connections. Don't do that.
http://www.postfix.org/TUNING_README.html#rope
--
Noel Jones
I apologize, I thought that 421 in the MTA world is as self-explanatory
as say 403 in the http world.
> Having looked at the text below, I think your problem is that
> you are making an insane number of SIMULTANEOUS connections
> to the primary MX host.
This is correct.
>>>> http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
>>>> misses to explain what is "high volume of mail in the active queue".
>>>> When is exactly connection caching activated?
>>> Roughly, it is activated when the active queue contains another
>>> message before the current delivery is completed.
>> If the primary MX is down for an extended period of time and a large
>> queue accumulates on the backup, all messages are rushed to the primary
>> MX in what it seems separate smtp connections. At least I was able to
>> count as many smtp processes in `ps` as 2/3 of the number of queued
>> messages, right after I issue `postfix flush`. If I specify explicit
>> caching for the particular mx host, things work as expected. I guess
>> there is not enough time for the caching on demand to activate when
>> doing a flush or having enough queued messages to simulate one.
>
> This is not a surprise.
>
> If the number of SIMULTANEOUS connections is 2/3 the number of
> queued messages, then most connections will never be reused because
> the mail is already delivered.
>
> I suggest that you revert to no more than 10-20 SIMULTANEOUS
> connections to the primary MX (or to any machine).
>
> /etc/postfix/main.cf:
> smtp_destination_concurrency_limit=20
> relay_destination_concurrency_limit=20
I never changed the defaults for those (postconf -n follows at the end
of the message)
> If you do that, not only will the primary MX perform better, you
> will also see connection reuse happen automatically.
>
I did more testing, using explicit smtp_connection_cache_destinations
and I still had the same experience. Rereading the documentation for the
n-th time I noticed the following in several places:
(in reference to *_destination_recipient_limit)
Setting this parameter to a value of 1 changes the meaning of
*_destination_concurrency_limit from concurrency per domain into
concurrency per recipient.
Does this by chance mean that *_destination_concurrency_limit refers to
individual _domains_ and not individual MTAs? I am relaying mail for 6
domains, all having the same primary MX (which is the one getting badly
hammered after being down for a while).
Thanks for the help
-------
postconf -n
Arx:/etc/postfix# postconf -n
address_verify_map = btree:/var/cache/postfix/verify.db
address_verify_negative_cache = yes
address_verify_negative_expire_time = 1d
address_verify_negative_refresh_time = 1h
address_verify_poll_count = 2
address_verify_poll_delay = 2s
address_verify_positive_expire_time = 31d
address_verify_positive_refresh_time = 7d
alias_database = $alias_maps
alias_maps = hash:/etc/aliases
append_dot_mydomain = no
backwards_bounce_logfile_compatibility = no
biff = no
bounce_queue_lifetime = 12h
bounce_size_limit = 20000
config_directory = /etc/postfix
hash_queue_depth = 1
hash_queue_names = ''
in_flow_delay = 0
inet_interfaces = all
inet_protocols = ipv4
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
maximal_queue_lifetime = 7d
message_size_limit = 0
minimal_backoff_time = 15m
mydestination = $mydomain, localhost.$mydomain, localhost
mydomain = rabbit.us
myhostname = arx.rabbit.us
mynetworks = 127.0.0.0/8 192.168.13.0/24 10.0.13.0/24 $inet_interfaces
myorigin = $mydomain
queue_directory = /var/spool/postfix
queue_minfree = 1000000
recipient_delimiter = +
relay_domains = <6 relayed domains withheld, all with same primary MX>
smtp_bind_address = 68.251.127.6
smtp_connect_timeout = 5s
smtp_connection_reuse_time_limit = 5m
smtp_helo_timeout = 1m
smtp_mail_timeout = 1m
smtp_mx_address_limit = 0
smtp_quit_timeout = 10s
smtp_skip_quit_response = yes
smtpd_authorized_verp_clients = $mynetworks
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_delay_reject = yes
smtpd_error_sleep_time = 3s
smtpd_hard_error_limit = 20
smtpd_junk_command_limit = 20
smtpd_recipient_limit = 200
smtpd_recipient_restrictions = permit_mynetworks
reject_unauth_destination reject_unknown_recipient_domain
reject_unverified_recipient
smtpd_sender_restrictions = reject_unknown_sender_domain
smtpd_soft_error_limit = 5
smtpd_timeout = 30s
syslog_name = postfix
Arx:/etc/postfix#
Arx:~# grep concurrency_limit /etc/postfix/main.cf
Arx:~#
Arx:~# grep recipient_limit /etc/postfix/main.cf
smtpd_recipient_limit = 200
Arx:~#
Arx:~# grep _limit /etc/postfix/main.cf
bounce_size_limit = 20000
smtp_connection_cache_reuse_limit = 100
smtp_connection_reuse_time_limit = 5m
smtp_mx_address_limit = 0
smtpd_soft_error_limit = 5
smtpd_hard_error_limit = 20
smtpd_junk_command_limit = 20
smtpd_recipient_limit = 200
mailbox_size_limit = 0
message_size_limit = 0
Arx:~#
OK, if you are sending lotsa different domains to the same primary
MX, try reducing the process limit for the Postfix relay transport
in master.cf to say 20 and then "postfix reload".
That limits the total number of backup-to-primary connections for
the domains combined.
Wietse
Understood. Are there plans to make all concurrency settings mx-aware
instead of domain-based as it is now? Thanks for the help!
Peter
What is the output of
I have 14MB of plans sitting in the inbox. It's unlikely they will all
be completed (or that all of them should be).
Wietse