Postfix fails after upgrade from bullseye to bookworm

Charles Curley

unread,

Jan 13, 2023, 11:40:05 AM1/13/23

to

I upgraded an i386 machine from bullseye to bookworm. Postfix now
refuses to run.

root@white:/var/spool# systemctl start postfix@-.service
Job for postfix@-.service failed because a timeout was exceeded.
See "systemctl status postfix@-.service" and "journalctl -xeu postfix@-.service" for details.
root@white:/var/spool# systemctl status postfix@-.service
× postfix@-.service - Postfix Mail Transport Agent (instance -)
Loaded: loaded (/lib/systemd/system/postfix@.service; enabled-runtime; preset: enabled)
Active: failed (Result: timeout) since Fri 2023-01-13 07:55:51 MST; 17s ago
Docs: man:postfix(1)
Process: 5170 ExecStartPre=/usr/lib/postfix/configure-instance.sh - (code=killed, signal=TERM)
CPU: 1min 29.456s

Jan 13 07:54:21 white systemd[1]: Starting Postfix Mail Transport Agent (instance -)...
Jan 13 07:55:51 white systemd[1]: postfix@-.service: start-pre operation timed out. Terminating.
Jan 13 07:55:51 white systemd[1]: postfix@-.service: Control process exited, code=killed, status=15/TERM
Jan 13 07:55:51 white systemd[1]: postfix@-.service: Failed with result 'timeout'.
Jan 13 07:55:51 white systemd[1]: Failed to start Postfix Mail Transport Agent (instance -).
Jan 13 07:55:51 white systemd[1]: postfix@-.service: Consumed 1min 29.456s CPU time.
root@white:/var/spool# postconf -n
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
append_dot_mydomain = no
biff = no
compatibility_level = 3.6
inet_interfaces = all
inet_protocols = ipv4
mailbox_size_limit = 0
mydestination = white.localdomain, white.localdomain, white, localhost.localdomain, localhost
myhostname = white.localdomain
mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
myorigin = /etc/mailname
readme_directory = no
recipient_delimiter = +
relayhost = smtp.localdomain
smtp_tls_CApath = /etc/ssl/certs
smtp_tls_security_level = may
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination
smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtpd_tls_security_level = may
root@white:/var/spool#

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Dan Ritter

unread,

Jan 13, 2023, 12:20:05 PM1/13/23

to

Charles Curley wrote:
> I upgraded an i386 machine from bullseye to bookworm. Postfix now
> refuses to run.

Run the postfix executable by hand as root, and look for error messages
and log entries in /var/log/mail.log among other locations.

-dsr-

Charles Curley

unread,

Jan 13, 2023, 2:00:05 PM1/13/23

to

On Fri, 13 Jan 2023 12:00:31 -0500
Dan Ritter <d...@randomstring.org> wrote:

> Run the postfix executable by hand as root, and look for error
> messages and log entries in /var/log/mail.log among other locations.

Well, that was interesting. Thanks.

In mail.log I found the following from an earlier run:

2023-01-13T06:00:11.188254-07:00 white postfix/postfix-script[3068]: warning: /var/spool/postfix/etc/localtime and /etc/localtime differ
2023-01-13T06:00:11.303294-07:00 white postfix/postfix-script[3073]: warning: /var/spool/postfix/lib/i386-linux-gnu/libnss_systemd.so.2 and /lib/i386-linux-gnu/libnss_systemd.so.2 differ
2023-01-13T06:00:11.479941-07:00 white postfix/postfix-script[3080]: warning: /var/spool/postfix/lib/i386-linux-gnu/libgcc_s.so.1 and /lib/i386-linux-gnu/libgcc_s.so.1 differ

(I've pasted these in unwrapped, but they may get mangled in transit or
by your mail reader.)

The message isn't quite correct: the three files under /var/spool/…
were missing. I copied the originals into place.

From a run just now:

root@white:~# postfix start
postfix: Postfix is using backwards-compatible default settings
postfix: See http://www.postfix.org/COMPATIBILITY_README.html for details
postfix: To disable backwards compatibility use "postconf compatibility_level=3.6" and "postfix reload"
postfix/postfix-script: starting the Postfix mail system
root@white:~#

and ps indicates a slew of postfix jobs running, and a whole bunch of
test emails were delivered. That suggests there's something wrong with
the way systemd is starting postfix. I will look into that later today.

Charles Curley

unread,

Jan 20, 2023, 11:40:06 AM1/20/23

to

On Fri, 13 Jan 2023 11:52:29 -0700
Charles Curley <charle...@charlescurley.com> wrote:

> That suggests there's something wrong with
> the way systemd is starting postfix. I will look into that later
> today.

Not quite "later today", but:

A bit of thinking about it, and I realized that the computer in
question is an ancient 686 with very limited RAM, physical and swap. So
I experimented with watching the startup using htop. That got me
thinking that maybe the start timeout was too short.

I edited postfix@-.service, like so:

systemctl edit postfix@-.service

and added a line to the end of the [service] stanza:

TimeoutSec=360

So:

root@white:/etc# diff systemd/system/postfix@-.service /lib/systemd/system/postfix@.service
17d16
< TimeoutSec=360
20c19
< WantedBy=multi-user.target
\ No newline at end of file
---
> WantedBy=multi-user.target
root@white:/etc#

That seems to have fixed the problem. We'll see, said the zen master.

Sven Joachim

unread,

Jan 20, 2023, 1:20:05 PM1/20/23

to

On 2023-01-20 09:34 -0700, Charles Curley wrote:

> On Fri, 13 Jan 2023 11:52:29 -0700
> Charles Curley <charle...@charlescurley.com> wrote:
>
>> That suggests there's something wrong with
>> the way systemd is starting postfix. I will look into that later
>> today.
>
> Not quite "later today", but:
>
> A bit of thinking about it, and I realized that the computer in
> question is an ancient 686 with very limited RAM, physical and swap. So
> I experimented with watching the startup using htop. That got me
> thinking that maybe the start timeout was too short.
>
> I edited postfix@-.service, like so:
>
> systemctl edit postfix@-.service
>
> and added a line to the end of the [service] stanza:
>
> TimeoutSec=360

It seems that postfix's startup time has greatly regressed, on my laptop
there are very long delays both at boot:

,----
| $ systemd-analyze blame | head -n1
| 33.340s postfix@-.service
`----

as well when restarting postfix:

,----
| $ time sudo systemctl restart postfix.service
| sudo systemctl restart postfix.service 0,06s user 0,03s system 0% cpu 38,611 total
`----

Clearly something fishy is going on here.

Cheers,
Sven

Greg Wooledge

unread,

Jan 20, 2023, 1:50:05 PM1/20/23

to

On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:
> It seems that postfix's startup time has greatly regressed, on my laptop
> there are very long delays both at boot:
>
> ,----
> | $ systemd-analyze blame | head -n1
> | 33.340s postfix@-.service
> `----

A delay that's a multiple of 30 seconds is very often a DNS lookup
failure. I would imagine your postfix configuration is trying to
perform a DNS lookup on some hostname or other, and that this is
happening before you're actually "online", for whatever definition of
"online" is relevant here.

That's a total guess, though. Find your logfiles and read them to see
what's actually going on.

Charles Curley

unread,

Jan 20, 2023, 2:00:06 PM1/20/23

to

On Fri, 20 Jan 2023 19:17:37 +0100
Sven Joachim <sven...@gmx.de> wrote:

> Clearly something fishy is going on here.

I concur. What I saw with htop was a slew of calls to SSL. Here's
a sample of what it was doing. It is a processor hog.

root@white:~# ps aux | grep -i openssl
root 4586 5.8 0.9 8256 2064 pts/3 S+ 11:48 0:00 grep --colour=auto -i openssl
root 4587 150 2.1 8888 4720 ? R 11:48 0:00 /usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem
root@white:~#

I have no idea what that's about. Maybe someone with SSL experience can
chime in here?

My previous timeout vale was 180 seconds, which wasn't enough. So my
ancient anemic box needed between 180 and 360 seconds to start postfix.
(But see below.)

To repeat your commands:

root@white:~# systemd-analyze blame | grep postfix
4min 24.559s postfix@-.service
35ms postfix.service
root@white:~# time systemctl restart postfix.service

real 4m10.833s
user 0m0.081s
sys 0m0.062s
root@white:~#

Meanwhile simply running "postfix start" took much less time.

Sven Joachim

unread,

Jan 20, 2023, 2:30:06 PM1/20/23

to

On 2023-01-20 13:39 -0500, Greg Wooledge wrote:

> On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:
>> It seems that postfix's startup time has greatly regressed, on my laptop
>> there are very long delays both at boot:
>>
>> ,----
>> | $ systemd-analyze blame | head -n1
>> | 33.340s postfix@-.service
>> `----
>
> A delay that's a multiple of 30 seconds is very often a DNS lookup
> failure. I would imagine your postfix configuration is trying to
> perform a DNS lookup on some hostname or other, and that this is
> happening before you're actually "online", for whatever definition of
> "online" is relevant here.

That should be NetworkManager-wait-online.service. In the logs I see
that systemd starts the postfix service directly after reaching
network-online.target, as it is supposed to do. The mystery is why it
takes another 30+ seconds before any messages from postfix itself appear
in the logs.

> That's a total guess, though. Find your logfiles and read them to see
> what's actually going on.

Here is what I see in the journal when I restart postfix.service:

,----
| Jan 20 20:16:06 myhost postfix/postfix-script[1470]: stopping the Postfix mail system
| Jan 20 20:16:06 myhost postfix/master[1307]: terminating on signal 15
| Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Deactivated successfully.
| Jan 20 20:16:06 myhost systemd[1]: Stopped Postfix Mail Transport Agent (instance -).
| Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Consumed 36.372s CPU time.
| Jan 20 20:16:06 myhost systemd[1]: Starting Postfix Mail Transport Agent (instance -)...
| Jan 20 20:16:41 myhost postfix[1800]: Postfix is using backwards-compatible default settings
| Jan 20 20:16:41 myhost postfix[1800]: See http://www.postfix.org/COMPATIBILITY_README.html for details
| Jan 20 20:16:41 myhost postfix[1800]: To disable backwards compatibility use "postconf compatibility_level=3.6" and "pos\
| tfix reload"
| Jan 20 20:16:42 myhost postfix/postfix-script[2026]: starting the Postfix mail system
| Jan 20 20:16:42 myhost postfix/master[2028]: daemon started -- version 3.7.3, configuration /etc/postfix
| Jan 20 20:16:42 myhost systemd[1]: Started Postfix Mail Transport Agent (instance -).
| Jan 20 20:16:42 myhost systemd[1]: Starting Postfix Mail Transport Agent...
| Jan 20 20:16:42 myhost systemd[1]: Finished Postfix Mail Transport Agent.
`----

I have left nothing out, so WTH is postfix waiting for in these 35
seconds?

Cheers,
Sven

Greg Wooledge

unread,

Jan 20, 2023, 2:30:06 PM1/20/23

to

On Fri, Jan 20, 2023 at 11:55:35AM -0700, Charles Curley wrote:
> My previous timeout vale was 180 seconds, which wasn't enough. So my
> ancient anemic box needed between 180 and 360 seconds to start postfix.
> (But see below.)

More multiples of 30 seconds. I'm still thinking "DNS issue".

Sven Joachim

unread,

Jan 20, 2023, 2:50:05 PM1/20/23

to

On 2023-01-20 11:55 -0700, Charles Curley wrote:

> On Fri, 20 Jan 2023 19:17:37 +0100
> Sven Joachim <sven...@gmx.de> wrote:
>
>> Clearly something fishy is going on here.
>
> I concur. What I saw with htop was a slew of calls to SSL. Here's
> a sample of what it was doing. It is a processor hog.
>
> root@white:~# ps aux | grep -i openssl
> root 4586 5.8 0.9 8256 2064 pts/3 S+ 11:48 0:00 grep --colour=auto -i openssl
> root 4587 150 2.1 8888 4720 ? R 11:48 0:00 /usr/bin/openssl x509
> -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem

Indeed I see many calls to openssl in top, apparently they are children
of a single c_rehash process. CPU load is low here, though (2-3 %).

> I have no idea what that's about. Maybe someone with SSL experience can
> chime in here?

My hunch is that postfix recomputes all the hashes in
/var/spool/postfix/etc/ssl/certs, rather than copying the files from the
host system into the chroot which would be a lot faster.

It is probably time for me to revisit my postfix configuration.

Cheers,
Sven

Charles Curley

unread,

Jan 20, 2023, 2:50:05 PM1/20/23

to

On Fri, 20 Jan 2023 14:28:22 -0500
Greg Wooledge <gr...@wooledge.org> wrote:

> More multiples of 30 seconds. I'm still thinking "DNS issue".

In this case, laziness. The default timeout is 60 seconds. I added 30
to that. Then doubled it. Etc. That doesn't mean you are wrong. I'd
like to know what that ssl command is up to.

The problem is not for lack of network. I executed all the commands I
showed in my recent emails via ssh over Ethernet, and that machine has
full access to the Internet.

Jeffrey Walton

unread,

Jan 20, 2023, 3:10:06 PM1/20/23

to

You should provide the output of journalctl. That usually has the
details that are useful in diagnosing a problem.

> Does anybody read signatures any more?
>
> https://charlescurley.com
> https://charlescurley.com/blog/

I try to skip them. There's no point in wasting time on them.

Jeff

Christoph Brinkhaus

unread,

Jan 20, 2023, 3:20:05 PM1/20/23

to

Am Fri, Jan 20, 2023 at 08:28:10PM +0100 schrieb Sven Joachim:
> On 2023-01-20 13:39 -0500, Greg Wooledge wrote:
>
> > On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:

Hello Community,

I have observed something comparable, but with fetchmail for email
and unbound as a dns resover. Both are started by systemd at almost
the same time. But it takes some time for unbound to fetch keys and so
on. Before unbound is really ready fetchmail has started its first
poll, but without success because the mail servers ip could not be
resolved.

I am only a hobbyist. But may be something similar is happening in the
discussed issue. May be it is worth to have a look how the dns lookup
is set up and how it is starting.

Please just ignore be if this proposal makes no sense.
At work I am no admin :-).

> I have left nothing out, so WTH is postfix waiting for in these 35
> seconds?
>
> Cheers,
> Sven

BTW: I have resolved my issue by starting fetchmail via systemd by a
wrapper script which probes the dns resolution of the mail server
before starting fetchmail.

Kind regards,
Christoph
--
Ist die Katze gesund
schmeckt sie dem Hund.

Sven Joachim

unread,

Jan 20, 2023, 3:20:05 PM1/20/23

to

For those who want to dig deeper, /usr/lib/postfix/configure-instance.sh
is the (Debian-specific) script which sets up the chroot. Surely it
should not recompute all the hashes every time postfix is restarted, but
apparently it does so at least on Charles' and my system.

Cheers,
Sven

Sven Joachim

unread,

Jan 20, 2023, 3:40:06 PM1/20/23

to

Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash"
to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
recommended there, and now "systemctl restart postfix.service" completes
in two seconds!

Will follow up on the bug ASAP, if nobody beats me to it.

Cheers,
Sven

1. https://bugs.debian.org/895089

Charles Curley

unread,

Jan 20, 2023, 5:30:06 PM1/20/23

to

On Fri, 20 Jan 2023 21:30:22 +0100
Sven Joachim <sven...@gmx.de> wrote:

> Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash"
> to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
> recommended there, and now "systemctl restart postfix.service"
> completes in two seconds!
>
> Will follow up on the bug ASAP, if nobody beats me to it.

Fix and everything. Nice! Good work, thank you.

I'll apply the fix later and see what it does on my sclerotic anemic
686 box.

Max Nikulin

unread,

Jan 20, 2023, 9:20:05 PM1/20/23

to

On 21/01/2023 01:55, Charles Curley wrote:
>
> root@white:~# ps aux | grep -i openssl
> root 4586 5.8 0.9 8256 2064 pts/3 S+ 11:48 0:00 grep --colour=auto -i openssl
> root 4587 150 2.1 8888 4720 ? R 11:48 0:00 /usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem

...> I have no idea what that's about. Maybe someone with SSL experience can
> chime in here?

Delays related to cryptography may be caused by lack of entropy. You may
try to check

cat /proc/sys/kernel/random/entropy_avail

and

ls -l /proc/<PID>/fs

when such delay happens. Some network activity may increase entropy
enough to unblock reading random bytes. If my guess is correct then the
issue may be alleviated by installing the haveged package. VMs are
affected more than hosts.

Charles Curley

unread,

Jan 20, 2023, 10:00:06 PM1/20/23

to

On Fri, 20 Jan 2023 21:30:22 +0100
Sven Joachim <sven...@gmx.de> wrote:

> Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash"
> to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
> recommended there, and now "systemctl restart postfix.service"
> completes in two seconds!

A similar major improvement for me.

root@white:~# time systemctl restart postfix.service

real 0m18.389s
user 0m0.081s
sys 0m0.059s
root@white:~#

vs.

root@white:~# time systemctl restart postfix.service

real 4m10.833s
user 0m0.081s
sys 0m0.062s
root@white:~#

Thank you, Sven, for pursuing this.

The patch has been committed, and should be in the pipeline.