mailing lists and "unknown mail transport error"

Dominik Storck

unread,

Aug 4, 2010, 4:39:31 AM8/4/10

to

Hello,

I have a problem here with mailing lists.

I've configured these as entries in the /etc/aliases like
group-x: :include:/etc/group-x.list

and corresponding files /etc/group-x.list with one line per (local)
recipient

This has been working perfectly for years. Now the number of recipients
for some of
these lists have increased to more than 200.

When a mail is sent to these recipients mail delivery starts as expected
but stops
short before end of list. The exact count changes, probably due to different
state of of concurrent mail queue entries.

The error message is an "unknown mail transport error", the mail stays
in the queue and
delivery starts over again from the beginnig until I remove the mail
from the queue.

I believe there is some limit to 200 recipients, queue entries or whatever.

Playing with parameters like
default_destination_concurrency_limit = 300
default_destination_recipient_limit = 300
smtp_recipient_limit = 500
did not produce any visible effect.

Any ideas would be appreciated.

Thanks
Dominik

here is an extract from the log
==================================================================
Aug 2 14:50:49 postfix postfix/pickup[27603]: 720441D34008: uid=0
from=<root>
Aug 2 14:50:49 postfix postfix/cleanup[28372]: 720441D34008:
message-id=<2010080212504...@postfix.meinedomain.com>
Aug 2 14:50:49 postfix postfix/qmgr[904]: 720441D34008:
from=<ro...@postfix.meinedomain.com>, size=577, nrcpt=1 (queue active)

Aug 2 14:50:49 postfix postfix/local[29097]: 720441D34008:
to=<a...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=0.34, delays=0.23/0/0/0.11, dsn=2.0.0, status=sent
(delivered to maildir)

...
[211 lines removed, differing only in recipient address and increasing
delay]
...

Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/vacation -j wie)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)

Aug 2 14:52:08 postfix postfix/error[32270]: 720441D34008:
to=<a...@meinedomain.com>, relay=none, delay=79, delays=0.23/79/0/0.67,
dsn=4.3.0, status=deferred (unknown mail transport error)

==================================================================

postconf -n:
==================================================================
alias_maps = hash:/etc/aliases
biff = no
canonical_maps = hash:/etc/postfix/canonical
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/lib/postfix
data_directory = /var/lib/postfix
debug_peer_level = 2
defer_transports =
disable_dns_lookups = no
disable_mime_output_conversion = no
header_checks = regexp:/etc/postfix/header_checks
home_mailbox = Maildir/
html_directory = /usr/share/doc/packages/postfix/html
inet_interfaces = all
inet_protocols = all
local_destination_concurrency_limit = 2
mail_owner = postfix
mail_spool_directory = /var/mail
mailbox_command =
mailbox_size_limit = 0
mailbox_transport =
mailq_path = /usr/bin/mailq
manpage_directory = /usr/share/man
masquerade_classes = envelope_sender, header_sender, header_recipient
masquerade_domains =
masquerade_exceptions = root
message_size_limit = 1024000000
mydestination = $myhostname localhost.$mydomain my-domain.com
myhostname = postfix.my-domain.com
mynetworks_style = subnet
newaliases_path = /usr/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = /usr/share/doc/packages/postfix/README_FILES
relayhost =
relocated_maps = hash:/etc/postfix/relocated
sample_directory = /usr/share/doc/packages/postfix/samples
sender_canonical_maps = hash:/etc/postfix/sender_canonical
sendmail_path = /usr/sbin/sendmail
setgid_group = maildrop
smtp_sasl_auth_enable = no
smtp_use_tls = yes
smtpd_client_connection_count_limit = 0
smtpd_client_restrictions =
smtpd_helo_required = yes
smtpd_helo_restrictions = permit_mynetworks, check_sender_access
hash:/etc/postfix/access, check_client_access
hash:/etc/postfix/access,reject_non_fqdn_hostname, permit
smtpd_recipient_restrictions =
permit_sasl_authenticated,check_sender_access
hash:/etc/postfix/access,check_client_access
hash:/etc/postfix/access,reject_unauth_destination,reject_non_fqdn_sender,reject_non_fqdn_recipient,
reject_unverified_recipient,reject_unauth_destination,permit
smtpd_sasl_auth_enable = no
smtpd_sender_restrictions = permit_mynetworks, check_sender_access
hash:/etc/postfix/access, permit
smtpd_use_tls = no
strict_8bitmime = no
strict_rfc821_envelopes = no
transport_maps = hash:/etc/postfix/transport
unknown_local_recipient_reject_code = 550
virtual_alias_domains = hash:/etc/postfix/virtual
virtual_alias_maps = hash:/etc/postfix/virtual
==================================================================

Wietse Venema

unread,

Aug 4, 2010, 12:02:00 PM8/4/10

to

See http://www.postfix.org/DEBUG_README.html#logging

When Postfix does not receive or deliver mail, the first order of business is
to look for errors that prevent Postfix from working properly:

% egrep '(warning|error|fatal|panic):' /some/log/file | more

Note: the most important message is near the BEGINNING of the output. Error
messages that come later are less useful.

The nature of each problem is indicated as follows:

* "panic" indicates a problem in the software itself that only a programmer
can fix. Postfix cannot proceed until this is fixed.

* "fatal" is the result of missing files, incorrect permissions, incorrect
configuration file settings that you can fix. Postfix cannot proceed until
this is fixed.

* "error" reports an error condition. For safety reasons, a Postfix process
will terminate when more than 13 of these happen.

* "warning" indicates a non-fatal error. These are problems that you may not
be able to fix (such as a broken DNS server elsewhere on the network) but
may also indicate local configuration errors that could become a problem
later.

Andrzej Kukuła

unread,

Aug 5, 2010, 6:58:59 AM8/5/10

to

On Wed, Aug 4, 2010 at 10:39, Dominik Storck <dom...@storck.net> wrote:
>
> This has been working perfectly for years. Now the number of recipients
> for some of
> these lists have increased to more than 200.
>
> When a mail is sent to these recipients mail delivery starts as expected
> but stops
> short before end of list. The exact count changes, probably due to different
> state of of concurrent mail queue entries.
>
> The error message is an "unknown mail transport error", the mail stays
> in the queue and
> delivery starts over again from the beginnig until I remove the mail
> from the queue.
>
> I believe there is some limit to 200 recipients, queue entries or whatever.

I'd speculate it's low open file limit in operating system. I had this
once when my 'everyone' alias exceeded several hundred users. See
ulimit -n
Increase it in your postfix startup script to, say, 100000, and
observe the difference.

Regards,
Andrzej

Dominik Storck

unread,

Aug 9, 2010, 2:25:25 PM8/9/10

to

Hello Andrzej,

thanks for your suggestion. However, increasing ulimit -n did not help
in the first place .

Debugging info by adding -v in master.cf came up with lots of debugging
info yet not leading to success.

After attaching strace to the local daemon, I observed a kind of a loop
when it came to evaluate the .forward files in the users home
directories, several of them in the form /home/username/.forward =
"\username". These files are produced by a vacation plugin to our
squirrelmail web access to the mailboxes, when the users disable
forwarding and or vacation messages.

After sending mails to almost all of the recipients - many of them have
these .forward-files, evaluated without any problem - strace showed up
repeated accesses to the .forward file for the same user about 4300 (!)
times - probably until the above ulimit is touched and the process
segfaults.

As I didn't figure out by now what causes the loop, I'd appreciate any
other idea.

Here is a snippet of the loops output, lots of similar block before and
after this one: The only parameter that changes is the 4304 here, which
i believe to be the file handle. This number is incremented with each
loop iteration.

=======================================================
Aug 9 12:06:05 postfix logger:
lstat64("/home/wre/.forward",{st_mode=S_IFREG|0600, st_size=6, ...}) = 0
Aug 9 12:06:05 postfix logger: geteuid32()
= 1193
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: getegid32() = 51
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0
Aug 9 12:06:05 postfix logger: open("/home/wre/.forward", O_RDONLY)
= 4304
Aug 9 12:06:05 postfix logger: geteuid32()
= 1193
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0
Aug 9 12:06:05 postfix logger: fcntl64(4304, F_GETFD) = 0
Aug 9 12:06:05 postfix logger: fcntl64(4304, F_SETFD, FD_CLOEXEC) = 0
Aug 9 12:06:05 postfix logger: read(4304, "\\wre\r\n", 4096) = 6
Aug 9 12:06:05 postfix logger: time(NULL)
= 1281348364
Aug 9 12:06:05 postfix logger: time(NULL)
= 1281348364
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: getegid32() = 51
Aug 9 12:06:05 postfix logger: geteuid32() = 51
Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0
Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0
Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0
Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0
============================================================================

For now, after deleting all the "quasi empty" forward-files seems to
solve the problem, but I fear running into the same thing when the
number of recipients will increase.

Dominik

Wietse Venema

unread,

Aug 9, 2010, 4:17:49 PM8/9/10

to

In case you wonder, I wrote Postfix.

Perhaps you can follow instructions in

http://www.postfix.org/.DEBUG_README.html#logging

TURN OFF -v logging before you do this.

Wietse

Dominik Storck

unread,

Aug 9, 2010, 5:52:18 PM8/9/10

to

Hello Wietse,

I have been tracking down to "Automatically tracing a Postfix daemon
process" from the DEBUG_README. This is how I got the log snippet.

Can't reproduce the error now after deleting unnecessary .forward files.

Hope things will stay this way.

Thank you

Dominik

Wietse Venema

unread,

Aug 9, 2010, 6:07:44 PM8/9/10

to

Dominik Storck:

> Hello Wietse,
>
> I have been tracking down to "Automatically tracing a Postfix daemon
> process" from the DEBUG_README. This is how I got the log snippet.
>
> Can't reproduce the error now after deleting unnecessary .forward files.

Please do not waste your time with verbose logging.

Please do not waste your time with debuggers,

LONG BEFORE the queue manager defers mail with "unknown transport
error", some other part of Postfix will log a WARNING:, FATAL: or
PANIC: record with the reason why the program is failing.

This is why I point you to DEBUG_README.html#logging and not
to the other DEBUG_README sections.

Wietse

Dominik Storck

unread,

Aug 10, 2010, 1:33:35 AM8/10/10

to

Hello Wietse,

I have been looking for these error messages over and over before I
started digging deeper. There are none of the error,fatal,warning or
panic messages before. The first error log entry to occur is the unknown
mail transport error, really.

Dominik

Wietse Venema

unread,

Aug 10, 2010, 7:35:24 AM8/10/10

to

Dominik Storck:
> Hello Wietse,
>

> I have been looking for these error messages over and over before I
> started digging deeper. There are none of the error,fatal,warning or
> panic messages before. The first error log entry to occur is the unknown
> mail transport error, really.

Looking at the original problem report:

Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/vacation -j wie)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)
Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008:
to=<w...@postfix.meinedomain.com>, orig_to=<a...@meinedomain.com>,
relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent
(delivered to maildir)

Aug 2 14:52:08 postfix postfix/error[32270]: 720441D34008:
to=<a...@meinedomain.com>, relay=none, delay=79, delays=0.23/79/0/0.67,
dsn=4.3.0, status=deferred (unknown mail transport error)

When you have a many-recipient "all" alias, you need to set up an
"owner-all" alias in the alias database (with the right-hand side
being the adminstrator of the "all" list). Besides changing the
way errors are reported, this also triggers a different delivery
strategy where one slow recipient won't cause all deliveries to be
deferred.

Wietse

Dominik Storck

unread,

Aug 10, 2010, 8:52:44 AM8/10/10

to

Hello Wietse,

have not been aware of the difference. Added the owner-alias and will
keep an eye on the list.

Thanks a lot

Dominik

Phill Macey

unread,

Aug 10, 2010, 9:31:07 AM8/10/10

to

On 10 August 2010 21:35, Wietse Venema <wie...@porcupine.org> wrote:

> When you have a many-recipient "all" alias, you need to set up an
> "owner-all" alias in the alias database (with the right-hand side
> being the adminstrator of the "all" list). Besides changing the
> way errors are reported, this also triggers a different delivery
> strategy where one slow recipient won't cause all deliveries to be
> deferred.

I was aware of the error reporting change that this triggers but not
of the different delivery strategy for slow recipients. Where can I
read about the delivery strategy behaviour? It didnt seem to be
mentioned in aliases(5) with the error reporting stuff.

--
Phill

----------------------------------------------
There's no such thing as a stupid question,
but they're the easiest to answer!

At the end of the game, the king and the pawn go back in the same box.

I souport publik edukashun

Wietse Venema

unread,

Aug 10, 2010, 12:08:57 PM8/10/10

to

Phill Macey:

> On 10 August 2010 21:35, Wietse Venema <wie...@porcupine.org> wrote:
>
> > When you have a many-recipient "all" alias, you need to set up an
> > "owner-all" alias in the alias database (with the right-hand side
> > being the adminstrator of the "all" list). Besides changing the
> > way errors are reported, this also triggers a different delivery
> > strategy where one slow recipient won't cause all deliveries to be
> > deferred.
>
> I was aware of the error reporting change that this triggers but not
> of the different delivery strategy for slow recipients. Where can I
> read about the delivery strategy behaviour? It didnt seem to be
> mentioned in aliases(5) with the error reporting stuff.

It is a workaround.

Ideally, Postfix would figure this out for itself; people should
not have to jump such hoops. That said, after 10+ years it ought
to be documented.

Wietse