Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1001320: needrestart misdetects socket activated ssh and restarts service instead of socket

92 views
Skip to first unread message

Marc Haber

unread,
Dec 8, 2021, 7:40:03 AM12/8/21
to
Package: openssh-server
Version: 1:8.7p1-2
Severity: minor

Hi,

I am running a number of test systems with ssh as socket activated
service. Sometimes, after an update, I find myself without ssh access to
those systems (connection refused). After a console login and systemctl
restart ssh.socket, things are fine again.

I THINK this might be connected to needrestart. Today, a libc6 update
marked the running ssh daemon (that I was using for the update) as using
obsolete libraries, which resulted in the following console output:

Restarting services...
systemctl restart console-log.service cron.service exim4.service haveged.service ippl.service ntp.service rsyslog.service serial...@ttyS0.service ssh.service systemd-journald.service systemd-networkd.service systemd-resolved.service systemd-udevd.service
Job for ssh.service failed because the control process exited with error code.
See "systemctl status ssh.service" and "journalctl -xeu ssh.service" for details.
Service restarts being deferred:
/etc/needrestart/restart.d/dbus.service
systemctl restart ge...@tty1.service
systemctl restart systemd-logind.service
systemctl restart us...@1001.service

and the following log entries:
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping LSB: Puts a logfile pager on virtual consoles...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Regular background program processing daemon...
Dec 8 12:58:26 emptybookworm82 systemd[1]: cron.service: Deactivated successfully.
Dec 8 12:58:26 emptybookworm82 cron[429258]: (CRON) INFO (pidfile fd = 3)
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopped Regular background program processing daemon.
Dec 8 12:58:26 emptybookworm82 systemd[1]: cron.service: Consumed 15min 4.856s CPU time.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Started Regular background program processing daemon.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping LSB: exim Mail Transport Agent...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Entropy Daemon based on the HAVEGE algorithm...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping LSB: IP protocols logger...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Network Time Service...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping System Logging Service...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Serial Getty on ttyS0...
Dec 8 12:58:26 emptybookworm82 systemd[1]: serial...@ttyS0.service: Deactivated successfully.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopped Serial Getty on ttyS0.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Started Serial Getty on ttyS0.
Dec 8 12:58:26 emptybookworm82 systemd[1]: ssh.socket: Deactivated successfully.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Closed OpenBSD Secure Shell server socket.
Dec 8 12:58:26 emptybookworm82 systemd[1]: ssh.socket: Consumed 10.571s CPU time.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Starting OpenBSD Secure Shell server...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Flush Journal to Persistent Storage...
Dec 8 12:58:26 emptybookworm82 systemd[1]: systemd-networkd-wait-online.service: Deactivated successfully.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopped Wait for Network to be Configured.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Wait for Network to be Configured...
Dec 8 12:58:26 emptybookworm82 systemd[1]: Stopping Network Name Resolution...
Dec 8 12:58:26 emptybookworm82 systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION
Dec 8 12:58:26 emptybookworm82 systemd[1]: ssh.service: Failed with result 'exit-code'.
Dec 8 12:58:26 emptybookworm82 systemd[1]: Failed to start OpenBSD Secure Shell server.
Dec 8 12:58:26 emptybookworm82 ntpd[298]: ntpd exiting on signal 15 (Terminated)
Dec 8 12:58:26 emptybookworm82 ntpd[298]: 2a01:4f8:140:246a::2 local addr 2a01:4f8:140:246a::52:100 -> <null>
Dec 8 12:58:26 emptybookworm82 haveged[220]: haveged: Stopping due to signal 15
Dec 8 12:58:27 emptybookworm82 cron[429258]: (CRON) INFO (Skipping @reboot jobs -- not system startup)
Dec 8 12:58:27 emptybookworm82 systemd[1]: systemd-journal-flush.service: Deactivated successfully.
Dec 8 12:58:27 emptybookworm82 systemd[1]: Stopped Flush Journal to Persistent Storage.
Dec 8 12:58:27 emptybookworm82 exim4[429259]: exim4_listener.

To me, this looks like needrestart misdetects the sshd process as having
been started by an ssh.service instead of an ssh@.service, and that
stopping ssh.service stops ssh.socket for some reason (systemd
dependency?). Afterwards, ssh.service is restarted (which fails because
the port is still busy), and ssh.socket stays off, resulting in an
unreachable host.

Can you as the ssh maintainer give some insight whether this is an ssh,
a needrestart or an systemd issue? It definetely is annoying.

Greetings
Marc

Timo Weingärtner

unread,
Dec 8, 2021, 10:20:04 AM12/8/21
to
Hallo Marc Haber,

08.12.21 13:31 Marc Haber:
> I am running a number of test systems with ssh as socket activated
> service. Sometimes, after an update, I find myself without ssh access to
> those systems (connection refused). After a console login and systemctl
> restart ssh.socket, things are fine again.
>
> I THINK this might be connected to needrestart. Today, a libc6 update
> marked the running ssh daemon (that I was using for the update) as using
> obsolete libraries, which resulted in the following console output:

To me it looks like a problem in needrestart. The (forked off) sshd process
handling your client connection belongs to cgroup session-NN.scope, no matter
if it was started by systemd socket activation or regular sshd.

needrestart (invoked with "-vlp" here) detects a process with outdated libs:

[main] #2111961 uses deleted /lib/x86_64-linux-gnu/libnss_files-2.32.so
[main] #2111961 is a child of #2111904

Then it figures out the binary and the cgroup:

[main] #2111961 exe => /usr/sbin/sshd
[main] trying systemctl status

cgroup detection didn't work, so:

[main] #2111961 running /etc/needrestart/hook.d/10-dpkg
[main] #2111961 package: openssh-server
[main] #2111961 running /etc/needrestart/hook.d/20-rpm
[main] #2111961 running /etc/needrestart/hook.d/90-none

/etc/needrestart/hook.d/10-dpkg also finds /etc/init.d/ssh and we end up with:

Services:
[…]
- spamassassin.service
- ssh
- systemd-journald.service
[…]

Note the missing ".service". Then you have it invoke "systemctl restart
ssh.service" and voilà.

A workaround might be masking ssh.service.


Grüße
Timo
signature.asc

Marc Haber

unread,
Dec 19, 2021, 10:20:04 AM12/19/21
to
Hi Timo,

On Wed, Dec 08, 2021 at 04:01:30PM +0100, Timo Weingärtner wrote:
> 08.12.21 13:31 Marc Haber:
> > I am running a number of test systems with ssh as socket activated
> > service. Sometimes, after an update, I find myself without ssh access to
> > those systems (connection refused). After a console login and systemctl
> > restart ssh.socket, things are fine again.
> >
> > I THINK this might be connected to needrestart. Today, a libc6 update
> > marked the running ssh daemon (that I was using for the update) as using
> > obsolete libraries, which resulted in the following console output:
>
> To me it looks like a problem in needrestart. The (forked off) sshd process
> handling your client connection belongs to cgroup session-NN.scope, no matter
> if it was started by systemd socket activation or regular sshd.

I concur with your analysis. So we need a bug report against needrestart
with the title "misdetects ssh as started from ssh.service if it's
actually ssh.socket or ssh@.service"?

> A workaround might be masking ssh.service.

That seems to do it for me, this hasn't happeneed on my test systems
since I masked ssh.service. I do consider this a valid workaround (but
not a soution) for the time being.

ssh maintainer, I think this warrants at least some documentation, for
example in /usr/share/doc/openssh-server/README.Debian.gz, as the way
documented there just suggests disabling ssh.service and not masking it.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

Timo Weingärtner

unread,
Dec 20, 2021, 8:40:04 AM12/20/21
to
Hallo Marc Haber,

19.12.21 16:15 Marc Haber:
> On Wed, Dec 08, 2021 at 04:01:30PM +0100, Timo Weingärtner wrote:
> > 08.12.21 13:31 Marc Haber:
> > > I am running a number of test systems with ssh as socket activated
> > > service. Sometimes, after an update, I find myself without ssh access to
> > > those systems (connection refused). After a console login and systemctl
> > > restart ssh.socket, things are fine again.
> > >
> > > I THINK this might be connected to needrestart. Today, a libc6 update
> > > marked the running ssh daemon (that I was using for the update) as using
> >
> > > obsolete libraries, which resulted in the following console output:
> > To me it looks like a problem in needrestart. The (forked off) sshd
> > process
> > handling your client connection belongs to cgroup session-NN.scope, no
> > matter if it was started by systemd socket activation or regular sshd.
>
> I concur with your analysis. So we need a bug report against needrestart
> with the title "misdetects ssh as started from ssh.service if it's
> actually ssh.socket or ssh@.service"?

ssh.socket doesn't contain processes.
ssh@<connected_socket>.service would AFAIR be detected if libpam-systemd is
not installed or if the connection is not yet complete. At least I remember
(some years back) needrestart showing me ssh@<connected_socket>.service ticked
by default sawing off the branch I was sitting on when blindly nodded through.

We should be more specific here: it's about the per-client process which
should not get restarted by default.
Even when ssh.service is running it misdetects per-client processes, but in
that case it is usually quite harmless.

> > A workaround might be masking ssh.service.
>
> That seems to do it for me, this hasn't happeneed on my test systems
> since I masked ssh.service. I do consider this a valid workaround (but
> not a soution) for the time being.
>
> ssh maintainer, I think this warrants at least some documentation, for
> example in /usr/share/doc/openssh-server/README.Debian.gz, as the way
> documented there just suggests disabling ssh.service and not masking it.

Masking ssh.service also helps with people (possibly even including yourself)
doing "systemctl restart ssh" after editing sshd_config.


Grüße
Timo
signature.asc

Marc Haber

unread,
Dec 24, 2021, 6:30:05 AM12/24/21
to
So we agree here that it's mainly a documentation issue for ssh, so that
it should be recommended to actually mask ssh.service if socket
activation is used, right?

Timo Weingärtner

unread,
Dec 24, 2021, 8:30:03 AM12/24/21
to
Hallo Marc Haber,

24.12.21 12:22 Marc Haber:
> So we agree here that it's mainly a documentation issue for ssh, so that
> it should be recommended to actually mask ssh.service if socket
> activation is used, right?

For the bug on openssh: yes.

Documentation could look like:
If you decide to use socket activation consider masking ssh.service to avoid
accidentally doing the wrong thing with "service ssh restart" or equivalent.

For needrestart we have the problem of the heuristics, that were designed for
non-systemd-systems, also being used on a systemd-system.


Grüße
Timo
signature.asc

Marc Haber

unread,
Dec 24, 2021, 4:40:04 PM12/24/21
to
On Fri, Dec 24, 2021 at 02:21:16PM +0100, Timo Weingärtner wrote:
> For needrestart we have the problem of the heuristics, that were designed for
> non-systemd-systems, also being used on a systemd-system.

I have filed Bug#1002591 for needrestart.

Colin Watson

unread,
Dec 24, 2021, 6:10:04 PM12/24/21
to
On Fri, Dec 24, 2021 at 02:21:16PM +0100, Timo Weingärtner wrote:
> 24.12.21 12:22 Marc Haber:
> > So we agree here that it's mainly a documentation issue for ssh, so that
> > it should be recommended to actually mask ssh.service if socket
> > activation is used, right?
>
> For the bug on openssh: yes.
>
> Documentation could look like:
> If you decide to use socket activation consider masking ssh.service to avoid
> accidentally doing the wrong thing with "service ssh restart" or equivalent.

How does this patch look?

diff --git a/debian/README.Debian b/debian/README.Debian
index dbe6c2958..0851e38e3 100644
--- a/debian/README.Debian
+++ b/debian/README.Debian
@@ -193,9 +193,12 @@ you can run:

To make this permanent:

- systemctl disable ssh.service
+ systemctl mask ssh.service
systemctl enable ssh.socket

+("systemctl disable ssh.service" would also work, but masking avoids
+accidentally starting the service manually.)
+
This may be appropriate in environments where minimal footprint is critical
(e.g. cloud guests). Be aware that this bypasses MaxStartups, and systemd's
MaxConnections cannot quite replace this as it cannot distinguish between

--
Colin Watson (he/him) [cjwa...@debian.org]

Marc Haber

unread,
Dec 25, 2021, 2:30:04 AM12/25/21
to
On Fri, Dec 24, 2021 at 11:04:20PM +0000, Colin Watson wrote:
> diff --git a/debian/README.Debian b/debian/README.Debian
> index dbe6c2958..0851e38e3 100644
> --- a/debian/README.Debian
> +++ b/debian/README.Debian
> @@ -193,9 +193,12 @@ you can run:
>
> To make this permanent:
>
> - systemctl disable ssh.service
> + systemctl mask ssh.service
> systemctl enable ssh.socket

I think the service needs to be stopped (and disable?) before masking
it.

I would also mention that there might be cases of logins no longer
possible regarding library updates and needrestart. This is a serious
situation.

Colin Watson

unread,
Dec 28, 2021, 6:00:06 PM12/28/21
to
On Sat, Dec 25, 2021 at 08:18:11AM +0100, Marc Haber wrote:
> On Fri, Dec 24, 2021 at 11:04:20PM +0000, Colin Watson wrote:
> > diff --git a/debian/README.Debian b/debian/README.Debian
> > index dbe6c2958..0851e38e3 100644
> > --- a/debian/README.Debian
> > +++ b/debian/README.Debian
> > @@ -193,9 +193,12 @@ you can run:
> >
> > To make this permanent:
> >
> > - systemctl disable ssh.service
> > + systemctl mask ssh.service
> > systemctl enable ssh.socket
>
> I think the service needs to be stopped (and disable?) before masking
> it.

Stopping is already in the previous paragraph just outside the context
of the patch I posted, but I've adjusted the linking sentence to make it
clearer that you need to do both.

> I would also mention that there might be cases of logins no longer
> possible regarding library updates and needrestart. This is a serious
> situation.

Thanks, I've tweaked the parenthesis a bit more with that in mind.

Marc Haber

unread,
Dec 29, 2021, 1:50:04 AM12/29/21
to
Hi Colin,

thank you for addressing this. I appreciate that.
Technically, with socket activation, there is no "running sshd" while
nobody is logged in, but I still can log in. The issue is that if this
issue appears I can no longer log in because nothing, not even systemd,
is listening on Port 22.

Colin Watson

unread,
Dec 29, 2021, 7:10:03 AM12/29/21
to
On Wed, Dec 29, 2021 at 07:45:11AM +0100, Marc Haber wrote:
> On Tue, Dec 28, 2021 at 10:47:54PM +0000, Colin Watson wrote:
> > On Sat, Dec 25, 2021 at 08:18:11AM +0100, Marc Haber wrote:
> > > I would also mention that there might be cases of logins no longer
> > > possible regarding library updates and needrestart. This is a serious
> > > situation.
> >
> > Thanks, I've tweaked the parenthesis a bit more with that in mind.
>
> Technically, with socket activation, there is no "running sshd" while
> nobody is logged in, but I still can log in. The issue is that if this
> issue appears I can no longer log in because nothing, not even systemd,
> is listening on Port 22.

Fair enough - I've adjusted the documentation a bit further to make that
clear.
0 new messages