Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#793116: systemd-journald exiting with SIGUSR1

235 views
Skip to first unread message

Daniel Povey

unread,
Oct 15, 2016, 9:00:02 PM10/15/16
to

I just want to report that we are suffering from this bug, and it is quite frequent.
This is with version  215-17+deb8u5 .

root@a12:~# systemctl status systemd-journald

* systemd-journald.service - Journal Service

   Loaded: loaded (/lib/systemd/system/systemd-journald.service; static)

   Active: failed (Result: start-limit) since Sat 2016-10-15 12:01:57 EDT; 8h ago

     Docs: man:systemd-journald.service(8)

           man:journald.conf(5)

  Process: 51561 ExecStart=/lib/systemd/systemd-journald (code=killed, signal=USR1)

 Main PID: 51561 (code=killed, signal=USR1)



Daniel Povey

unread,
Oct 15, 2016, 10:00:03 PM10/15/16
to
Another observation about this bug, which might be helpful.

If the signal is sent to systemd-journald via
/bin/systemctl kill --kill-who=main --signal=SIGUSR1 systemd-journald.service
then messages like the following show up in the kernel messages from `dmesg -T`, like:

[Sat Oct 15 21:02:35 2016] systemd-journald[26517]: Received request to flush runtime journal from PID 1

but they don't show up in the output of 'journalctl -r'.  
In  /etc/systemd/journald.conf, it says:

#MaxLevelStore=debug
#MaxLevelSyslog=debug

so I would have thought the same messages would go to both places. 
I don't know if I'm misunderstanding something here..

Dan





On Sat, Oct 15, 2016 at 9:07 PM, Daniel Povey <dpo...@gmail.com> wrote:
BTW, I attach the output from `systemd-analyze dump`, as dump.txt.
It would be great if the debian people could help us look into this.
Lennart has a policy that his team will only look into bug reports in the latest two versions of systemd, and obviously we are well behind that.

Dan

Daniel Povey

unread,
Oct 21, 2016, 2:50:02 PM10/21/16
to
I just want to follow up on this that I believe I have found the reason for this bug and I have a solution which I am testing out.

In the output of `systemctl status systemd-journald` it says:
   Active: failed (Result: start-limit) 
and this is related to the systemd mechanism to stop cycles where services keep restarting.  It's related to the configs
StartLimitIntervalSec=   [defaults to DefaultStartLimitIntervalSec=10 sec by default]
StartLimitBurst=  [defaults to DefaultStartLimitBurst=5 by default]

so that if a service is restarted 5 times within 10 seconds, it's not restarted any more to avoid wasting resources.
What seems to happen is that if systemd-journal-flush.service sends the signal SIGUSR1 to systemd-journald and systemd-journald wasn't ready to handle it for some reason (e.g. had just been started, or there was some hangup), then systemd-journald is restarted, and systemd-journal-flush.service immediately restarts also.  Again systemd-journald won't be ready to handle the signal as it hasn't had time to run the signal handler, and the cycle repeats 5 times until the systemd-journald service hits the StartLimitBurst and dies.  I verified using `lastcomm` (we have process accounting enabled) that this did in fact happen.

The way I am trying to fix it is to introduce a restart delay of 10 seconds into the systemd-journal-flush service.  See the config file below; the only new line that differs from the defaults in  /lib/systemd/system/systemd-journal-flush.service is the line RestartSec=10. 

Dan



======
cat /etc/systemd/system/systemd-journal-flush.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Trigger Flushing of Journal to Persistent Storage
Documentation=man:systemd-journald.service(8) man:journald.conf(5)
DefaultDependencies=no
Requires=systemd-journald.service
After=systemd-journald.service local-fs.target remote-fs.target
Before=systemd-user-sessions.service
# Don't re-start this service too often, or we can get into a cycle where
# if this service was run just after systemd-journald was started, it
# fails to handle the signal, causing that service to be restarted
# and this servie to be immediately restarted also, until we hit
# the limit of StartLimitBurst=5 restarts and systemd-journald
# dies.
RestartSec=10

[Service]
ExecStart=/bin/systemctl kill --kill-who=main --signal=SIGUSR1 systemd-journald.service
Type=oneshot

Daniel Povey

unread,
Oct 28, 2016, 1:00:03 PM10/28/16
to
To follow up: this did not fix the problem.  
0 new messages