Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Postdrop doesn't always stop when "postfix stop" is issued

690 views
Skip to first unread message

Quanah Gibson-Mount

unread,
Aug 31, 2011, 7:36:22 PM8/31/11
to
This is extremely difficult to reproduce, but it does happen occasionally
-- We will tell postfix to stop, and once that is complete, a "postdrop"
process will sometimes remain, and will run until it is manually killed.

Is this an expected behavior of postdrop -- That after the master postfix
is stopped, it is expected sometimes that it may continue running,
regardless?

This is on Postfix 2.6 through Postfix 2.8 series.

Thanks,
Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration

Wietse Venema

unread,
Aug 31, 2011, 7:58:55 PM8/31/11
to
Quanah Gibson-Mount:

> This is extremely difficult to reproduce, but it does happen occasionally
> -- We will tell postfix to stop, and once that is complete, a "postdrop"
> process will sometimes remain, and will run until it is manually killed.
>
> Is this an expected behavior of postdrop -- That after the master postfix
> is stopped, it is expected sometimes that it may continue running,
> regardless?

This is 100% intentional. The Postfix sendmail command MUST NOT
drop mail on the floor while the mail system is down.

For example there are programs that run at boot time that rely on
the availability of sendmail command-line submission, such as text
editors that want to send "how to recover your session" email.

Other daemons such as cron may be running while the Postfix daemons
are down for whatever reason. Their mail should not be lost, either.

Wietse

Quanah Gibson-Mount

unread,
Aug 31, 2011, 10:29:14 PM8/31/11
to
--On Wednesday, August 31, 2011 7:58 PM -0400 Wietse Venema
<wie...@porcupine.org> wrote:

Hi Wietse,

Thanks, I think I understand what is happening. This is the Zimbra
Postfix, not the system one. We generally see this when upgrading Zimbra
to a newer version. I see that the order services stop is to have the
mailbox server (which receives email from postfix over LMTP) stop before
postfix is stopped. My guess is that postfix is in the middle of trying to
deliver an email to it when this happens. I'll change the stop order so
that postfix is stopped long before the mailbox, which should give postdrop
time to finish any deliveries it needs before the mailbox server is stopped.

--Quanah

Victor Duchovni

unread,
Sep 1, 2011, 2:03:43 PM9/1/11
to
On Wed, Aug 31, 2011 at 07:58:55PM -0400, Wietse Venema wrote:

> > This is extremely difficult to reproduce, but it does happen occasionally
> > -- We will tell postfix to stop, and once that is complete, a "postdrop"
> > process will sometimes remain, and will run until it is manually killed.
> >
> > Is this an expected behavior of postdrop -- That after the master postfix
> > is stopped, it is expected sometimes that it may continue running,
> > regardless?
>
> This is 100% intentional. The Postfix sendmail command MUST NOT
> drop mail on the floor while the mail system is down.

Well, yes, postdrop(1) is expected to reliably enqueue mail, even when
the mail system is down. This said, it is not really expected to enter
an infinite loop!

On Wed, Aug 31, 2011 at 04:36:22PM -0700, Quanah Gibson-Mount wrote:

> This is extremely difficult to reproduce, but it does happen
> occasionally -- We will tell postfix to stop, and once that is
> complete, a "postdrop" process will sometimes remain, and will run
> until it is manually killed.
>
> Is this an expected behavior of postdrop -- That after the master
> postfix is stopped, it is expected sometimes that it may continue
> running, regardless?

Normally, postdrop(1) will enqueue the message and exit, whether the
mail system is up or not. The only plausible failure reason is inability
to access the "maildrop" directory, either because the setgid bit has
been cleared on the postdrop(1) binary, or because the directory has
been moved, deleted, modified to not allow group write access, ...

So the question is what is it that is causing postdrop to loop while
trying to create the queue file?

/*
* Create a file with a temporary name that does not collide. The process
* ID alone is not sufficiently unique: maildrops can be shared via the
* network. Not that I recommend using a network-based queue, or having
* multiple hosts write to the same queue, but we should try to avoid
* losing mail if we can.
*
* If someone is racing against us, try to win.
*/
for (;;) {
GETTIMEOFDAY(tp);
vstring_sprintf(temp_path, "%s/%d.%d", queue_name,
(int) tp->tv_usec, pid);
if ((fd = open(STR(temp_path), O_RDWR | O_CREAT | O_EXCL, mode)) >= 0)
break;
if (errno == EEXIST || errno == EISDIR)
continue;
msg_warn("%s: create file %s: %m", myname, STR(temp_path));
sleep(10);
}

Are the "create file" warnings found in the system log?

--
Viktor.

Quanah Gibson-Mount

unread,
Sep 1, 2011, 2:26:48 PM9/1/11
to
--On Thursday, September 01, 2011 2:03 PM -0400 Victor Duchovni
<Victor....@morganstanley.com> wrote:

> So the question is what is it that is causing postdrop to loop while
> trying to create the queue file?
>
> /*
> * Create a file with a temporary name that does not collide. The
> process * ID alone is not sufficiently unique: maildrops can be
> shared via the * network. Not that I recommend using a network-based
> queue, or having * multiple hosts write to the same queue, but we
> should try to avoid * losing mail if we can.
> *
> * If someone is racing against us, try to win.
> */
> for (;;) {
> GETTIMEOFDAY(tp);
> vstring_sprintf(temp_path, "%s/%d.%d", queue_name,
> (int) tp->tv_usec, pid);
> if ((fd = open(STR(temp_path), O_RDWR | O_CREAT | O_EXCL, mode))
> >= 0) break;
> if (errno == EEXIST || errno == EISDIR)
> continue;
> msg_warn("%s: create file %s: %m", myname, STR(temp_path));
> sleep(10);
> }
>
> Are the "create file" warnings found in the system log?

Yes:

Mar 22 19:24:52 domain postfix/postdrop[3624]: warning: mail_queue_enter:
create file maildrop/976917.3624: No such file or directory

for example.

However, what is odd about this is we have postfix explicitly use a queue
directory that is always present (/opt/zimbra/data/postfix/spool/), so it
shouldn't be encountering any errors creating a file. :/

I was also wrong about the shutdown order -- We shutdown postfix first, and
then the other services.

Wietse Venema

unread,
Sep 1, 2011, 2:30:20 PM9/1/11
to
Victor Duchovni:

> On Wed, Aug 31, 2011 at 07:58:55PM -0400, Wietse Venema wrote:
>
> > > This is extremely difficult to reproduce, but it does happen occasionally
> > > -- We will tell postfix to stop, and once that is complete, a "postdrop"
> > > process will sometimes remain, and will run until it is manually killed.
> > >
> > > Is this an expected behavior of postdrop -- That after the master postfix
> > > is stopped, it is expected sometimes that it may continue running,
> > > regardless?
> >
> > This is 100% intentional. The Postfix sendmail command MUST NOT
> > drop mail on the floor while the mail system is down.
>
> Well, yes, postdrop(1) is expected to reliably enqueue mail, even when
> the mail system is down. This said, it is not really expected to enter
> an infinite loop!

Well, yes, one is not supposed to remove the submission directory and
ignore postdrop error messages.

If people use Postfix, then at least they have a chance to re-create
the missing directory or permissions, and avoid losing mail.

Wietse

Victor Duchovni

unread,
Sep 1, 2011, 2:38:00 PM9/1/11
to
On Thu, Sep 01, 2011 at 11:26:48AM -0700, Quanah Gibson-Mount wrote:

> > msg_warn("%s: create file %s: %m", myname, STR(temp_path));
> >

> >Are the "create file" warnings found in the system log?
>
> Yes:
>
> Mar 22 19:24:52 domain postfix/postdrop[3624]: warning:
> mail_queue_enter: create file maildrop/976917.3624: No such file or
> directory
>
> for example.

So, most likely the "maildrop" directory is no longer present, or the
queue directory itself has been moved, unmounted, ... The postdrop(1)
process performs a chdir(2) to the queue_directory, so if that is
replaced, it won't find a maildrop sub-directory...

> However, what is odd about this is we have postfix explicitly use a
> queue directory that is always present
> (/opt/zimbra/data/postfix/spool/), so it shouldn't be encountering
> any errors creating a file. :/

This claim looks implausible, or main.cf was briefly modified to cause
postdrop(1) to use the wrong directory, ...

Make sure you are checking the correct instance (generally the default
one with sendmail/postdrop).

--
Viktor.

0 new messages