I have a problem here with mailing lists.
I've configured these as entries in the /etc/aliases like
group-x: :include:/etc/group-x.list
and corresponding files /etc/group-x.list with one line per (local)
recipient
This has been working perfectly for years. Now the number of recipients
for some of
these lists have increased to more than 200.
When a mail is sent to these recipients mail delivery starts as expected
but stops
short before end of list. The exact count changes, probably due to different
state of of concurrent mail queue entries.
The error message is an "unknown mail transport error", the mail stays
in the queue and
delivery starts over again from the beginnig until I remove the mail
from the queue.
I believe there is some limit to 200 recipients, queue entries or whatever.
Playing with parameters like
default_destination_concurrency_limit = 300
default_destination_recipient_limit = 300
smtp_recipient_limit = 500
did not produce any visible effect.
Any ideas would be appreciated.
Thanks
Dominik
here is an extract from the log
==================================================================
Aug 2 14:50:49 postfix postfix/pickup[27603]: 720441D34008: uid=0
from=<root>
Aug 2 14:50:49 postfix postfix/cleanup[28372]: 720441D34008:
message-id=<2010080212504...@postfix.meinedomain.com>
Aug 2 14:50:49 postfix postfix/qmgr[904]: 720441D34008:
from=<ro...@postfix.meinedomain.com>, size=577, nrcpt=1 (queue active)
Aug 2 14:50:49 postfix postfix/local[29097]: 720441D34008:
to=<a...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=0.34, delays=0.23/0/0/0.11, dsn=2.0.0, status=sent
(delivered to maildir)
...
[211 lines removed, differing only in recipient address and increasing
delay]
...
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/vacation -j wie)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:08 postfix postfix/error[32270]: 720441D34008:
to=<a...@meinedomain.com>, relay=none, delay=79, delays=0.23/79/0/0.67,
dsn=4.3.0, status=deferred (unknown mail transport error)
==================================================================
postconf -n:
==================================================================
alias_maps = hash:/etc/aliases
biff = no
canonical_maps = hash:/etc/postfix/canonical
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/lib/postfix
data_directory = /var/lib/postfix
debug_peer_level = 2
defer_transports =
disable_dns_lookups = no
disable_mime_output_conversion = no
header_checks = regexp:/etc/postfix/header_checks
home_mailbox = Maildir/
html_directory = /usr/share/doc/packages/postfix/html
inet_interfaces = all
inet_protocols = all
local_destination_concurrency_limit = 2
mail_owner = postfix
mail_spool_directory = /var/mail
mailbox_command =
mailbox_size_limit = 0
mailbox_transport =
mailq_path = /usr/bin/mailq
manpage_directory = /usr/share/man
masquerade_classes = envelope_sender, header_sender, header_recipient
masquerade_domains =
masquerade_exceptions = root
message_size_limit = 1024000000
mydestination = $myhostname localhost.$mydomain my-domain.com
myhostname = postfix.my-domain.com
mynetworks_style = subnet
newaliases_path = /usr/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = /usr/share/doc/packages/postfix/README_FILES
relayhost =
relocated_maps = hash:/etc/postfix/relocated
sample_directory = /usr/share/doc/packages/postfix/samples
sender_canonical_maps = hash:/etc/postfix/sender_canonical
sendmail_path = /usr/sbin/sendmail
setgid_group = maildrop
smtp_sasl_auth_enable = no
smtp_use_tls = yes
smtpd_client_connection_count_limit = 0
smtpd_client_restrictions =
smtpd_helo_required = yes
smtpd_helo_restrictions = permit_mynetworks, check_sender_access
hash:/etc/postfix/access, check_client_access
hash:/etc/postfix/access,reject_non_fqdn_hostname, permit
smtpd_recipient_restrictions =
permit_sasl_authenticated,check_sender_access
hash:/etc/postfix/access,check_client_access
hash:/etc/postfix/access,reject_unauth_destination,reject_non_fqdn_sender,reject_non_fqdn_recipient,
reject_unverified_recipient,reject_unauth_destination,permit
smtpd_sasl_auth_enable = no
smtpd_sender_restrictions = permit_mynetworks, check_sender_access
hash:/etc/postfix/access, permit
smtpd_use_tls = no
strict_8bitmime = no
strict_rfc821_envelopes = no
transport_maps = hash:/etc/postfix/transport
unknown_local_recipient_reject_code = 550
virtual_alias_domains = hash:/etc/postfix/virtual
virtual_alias_maps = hash:/etc/postfix/virtual
==================================================================
When Postfix does not receive or deliver mail, the first order of business is
to look for errors that prevent Postfix from working properly:
% egrep '(warning|error|fatal|panic):' /some/log/file | more
Note: the most important message is near the BEGINNING of the output. Error
messages that come later are less useful.
The nature of each problem is indicated as follows:
* "panic" indicates a problem in the software itself that only a programmer
can fix. Postfix cannot proceed until this is fixed.
* "fatal" is the result of missing files, incorrect permissions, incorrect
configuration file settings that you can fix. Postfix cannot proceed until
this is fixed.
* "error" reports an error condition. For safety reasons, a Postfix process
will terminate when more than 13 of these happen.
* "warning" indicates a non-fatal error. These are problems that you may not
be able to fix (such as a broken DNS server elsewhere on the network) but
may also indicate local configuration errors that could become a problem
later.
I'd speculate it's low open file limit in operating system. I had this
once when my 'everyone' alias exceeded several hundred users. See
ulimit -n
Increase it in your postfix startup script to, say, 100000, and
observe the difference.
Regards,
Andrzej
thanks for your suggestion. However, increasing ulimit -n did not help
in the first place .
Debugging info by adding -v in master.cf came up with lots of debugging
info yet not leading to success.
After attaching strace to the local daemon, I observed a kind of a loop
when it came to evaluate the .forward files in the users home
directories, several of them in the form /home/username/.forward =
"\username". These files are produced by a vacation plugin to our
squirrelmail web access to the mailboxes, when the users disable
forwarding and or vacation messages.
After sending mails to almost all of the recipients - many of them have
these .forward-files, evaluated without any problem - strace showed up
repeated accesses to the .forward file for the same user about 4300 (!)
times - probably until the above ulimit is touched and the process
segfaults.
As I didn't figure out by now what causes the loop, I'd appreciate any
other idea.
Here is a snippet of the loops output, lots of similar block before and
after this one: The only parameter that changes is the 4304 here, which
i believe to be the file handle. This number is incremented with each
loop iteration.
=======================================================
Aug 9 12:06:05 postfix logger:
lstat64("/home/wre/.forward",{st_mode=S_IFREG|0600, st_size=6, ...}) = 0
Aug 9 12:06:05 postfix logger: geteuid32()
= 1193
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: getegid32() = 51
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0
Aug 9 12:06:05 postfix logger: open("/home/wre/.forward", O_RDONLY)
= 4304
Aug 9 12:06:05 postfix logger: geteuid32()
= 1193
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: fcntl64(4304, F_GETFD) = 0
Aug 9 12:06:05 postfix logger: fcntl64(4304, F_SETFD, FD_CLOEXEC) = 0
Aug 9 12:06:05 postfix logger: read(4304, "\\wre\r\n", 4096) = 6
Aug 9 12:06:05 postfix logger: time(NULL)
= 1281348364
Aug 9 12:06:05 postfix logger: time(NULL)
= 1281348364
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: getegid32() = 51
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0
============================================================================
For now, after deleting all the "quasi empty" forward-files seems to
solve the problem, but I fear running into the same thing when the
number of recipients will increase.
Dominik
Perhaps you can follow instructions in
http://www.postfix.org/.DEBUG_README.html#logging
TURN OFF -v logging before you do this.
Wietse
I have been tracking down to "Automatically tracing a Postfix daemon
process" from the DEBUG_README. This is how I got the log snippet.
Can't reproduce the error now after deleting unnecessary .forward files.
Hope things will stay this way.
Thank you
Dominik
Please do not waste your time with verbose logging.
Please do not waste your time with debuggers,
LONG BEFORE the queue manager defers mail with "unknown transport
error", some other part of Postfix will log a WARNING:, FATAL: or
PANIC: record with the reason why the program is failing.
This is why I point you to DEBUG_README.html#logging and not
to the other DEBUG_README sections.
Wietse
I have been looking for these error messages over and over before I
started digging deeper. There are none of the error,fatal,warning or
panic messages before. The first error log entry to occur is the unknown
mail transport error, really.
Dominik
Looking at the original problem report:
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/vacation -j wie)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:08 postfix postfix/error[32270]: 720441D34008:
to=<a...@meinedomain.com>, relay=none, delay=79, delays=0.23/79/0/0.67,
dsn=4.3.0, status=deferred (unknown mail transport error)
When you have a many-recipient "all" alias, you need to set up an
"owner-all" alias in the alias database (with the right-hand side
being the adminstrator of the "all" list). Besides changing the
way errors are reported, this also triggers a different delivery
strategy where one slow recipient won't cause all deliveries to be
deferred.
Wietse
have not been aware of the difference. Added the owner-alias and will
keep an eye on the list.
Thanks a lot
Dominik
> When you have a many-recipient "all" alias, you need to set up an
> "owner-all" alias in the alias database (with the right-hand side
> being the adminstrator of the "all" list). Besides changing the
> way errors are reported, this also triggers a different delivery
> strategy where one slow recipient won't cause all deliveries to be
> deferred.
I was aware of the error reporting change that this triggers but not
of the different delivery strategy for slow recipients. Where can I
read about the delivery strategy behaviour? It didnt seem to be
mentioned in aliases(5) with the error reporting stuff.
--
Phill
----------------------------------------------
There's no such thing as a stupid question,
but they're the easiest to answer!
At the end of the game, the king and the pawn go back in the same box.
I souport publik edukashun
It is a workaround.
Ideally, Postfix would figure this out for itself; people should
not have to jump such hoops. That said, after 10+ years it ought
to be documented.
Wietse