service with Restart=always not always restarting

814 views
Skip to first unread message

Gerard Meijer

unread,
Dec 10, 2014, 8:40:44 AM12/10/14
to coreo...@googlegroups.com
I'm testing a lot with some test services. Now this morning I had a service that seem to work fine, when I stop the docker container, systemd restarts it beautifully, etc. But now it entered the failed state and systemd does not try to restart it. My question is why.

I have this in my unit file:

Restart=always
RestartSec=10

Journalctl for the unit says:

Dec 10 11:01:58 xxx systemd[1]: doa3_test.service: main process exited, code=exited, status=143/n/a
Dec 10 11:01:58 xxx systemd[1]: Unit doa3_test.service entered failed state.
Dec 10 11:02:08 xxx systemd[1]: doa3_test.service holdoff time over, scheduling restart.
Dec 10 11:02:08 xxx systemd[1]: doa3_test.service failed to schedule restart job: Transaction is destructive.
Dec 10 11:02:08 xxx systemd[1]: Unit doa3_test.service entered failed state.

So the service exits (no idea why, but that's besides the point now), systemd tries to restart it and says the transaction is destructive and it leaves the service in failed state.

Systemctl:

● doa3_test.service - Test
   Loaded: loaded (/run/fleet/units/doa3_test.service; linked-runtime)
   Active: failed (Result: resources) since Wed 2014-12-10 11:02:08 UTC; 1h 38min ago
  Process: 18838 ExecStop=/usr/bin/docker stop test (code=exited, status=0/SUCCESS)
 Main PID: 914 (code=exited, status=143)

Dec 10 11:01:58 xxx systemd[1]: Unit doa3_test.service entered failed state.
Dec 10 11:02:08 xxx systemd[1]: doa3_test.service holdoff time over, scheduling restart.
Dec 10 11:02:08 xxx systemd[1]: doa3_test.service failed to schedule restart job: Transaction is destructive.
Dec 10 11:02:08 xxx systemd[1]: Unit doa3_test.service entered failed state.

Anybody any idea?

Gerard Meijer

unread,
Dec 10, 2014, 8:43:04 AM12/10/14
to coreo...@googlegroups.com
And to add:

sudo systemctl start doa3_test

starts up the service fine again, without errors.

Gerard Meijer

unread,
Dec 11, 2014, 10:39:03 AM12/11/14
to coreo...@googlegroups.com
I found out a little more. I keep having this problem. Today I experienced the same thing again. I checked the fleet log and found this at the same time the main unit gets stopped and fails to start:

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: ERROR engine.go:135: Unable to determine current lessee: timeout reached
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: WARN job.go:253: No Unit found in Registry for Job(doa3_test_restart_watcher.service)
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: ERROR job.go:95: Failed to parse Unit from etcd: unable to parse Unit in Registry at key
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO client.go:278: Failed getting response from https://[etcd-server]/:
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO client.go:278: Failed getting response from https://[etcd-server]/:
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:89: Triggered systemd unit doa3_test_restart_watcher.service stop: job=5
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:231: Removing systemd unit doa3_test_restart_watcher.service
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:142: Instructing systemd to reload units
Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=UnloadUnit job=doa3_test_res
Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:218: Writing systemd unit doa3_test_restart_watcher.service (996b)
Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:142: Instructing systemd to reload units
Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=LoadUnit job=doa3_test_resta
Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:78: Triggered systemd unit doa3_test_restart_watcher.service start: job=
Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=StartUnit job=doa3_test_rest

The doa3_test_restart_watcher is a sidekick service to doa3_test. doa3_test has a line "Wants=doa3_test_restart_watcher.service" and doa3_test_restart_watcher has a line "BindsTo=doa3.test.service".

So it seems like fleet itself is the cause of the unability for systemd to restart the job, since this specific unit stops the doa3_test service when it stops itself and then doa3_test gets started again by systemd, but cannot fulfill the "Wants=doa3_test_restart_watcher.service" line, since this unit was removed (according to the logs of fleet).

So now the question is, what do the lines from 14:02:17 mean exactly and how can we make sure they don't happen?

Ross Kukulinski

unread,
Dec 11, 2014, 2:55:23 PM12/11/14
to coreo...@googlegroups.com
Hi Gerard,

I can't help with the fleetd warnings/errors, but I was wondering why you are using 'Wants=doa3_test_restart_watcher.service' instead of 'Requires=doa3_test_restart_watcher.service'?

With Requires=, all listed required dependancies are started when this unit is activated.  Watcher does not start the listed deps.


Best regards,
Ross
_____

Ross Kukulinski
Yodlr Co-Founder / CEO
Reply all
Reply to author
Forward
0 new messages