service with Restart=always not always restarting

Gerard Meijer

unread,

Dec 10, 2014, 8:40:44 AM12/10/14

to coreo...@googlegroups.com

I'm testing a lot with some test services. Now this morning I had a service that seem to work fine, when I stop the docker container, systemd restarts it beautifully, etc. But now it entered the failed state and systemd does not try to restart it. My question is why.

I have this in my unit file:

Restart=always

RestartSec=10

Journalctl for the unit says:

Dec 10 11:01:58 xxx systemd[1]: doa3_test.service: main process exited, code=exited, status=143/n/a

Dec 10 11:01:58 xxx systemd[1]: Unit doa3_test.service entered failed state.

Dec 10 11:02:08 xxx systemd[1]: doa3_test.service holdoff time over, scheduling restart.

Dec 10 11:02:08 xxx systemd[1]: doa3_test.service failed to schedule restart job: Transaction is destructive.

Dec 10 11:02:08 xxx systemd[1]: Unit doa3_test.service entered failed state.

So the service exits (no idea why, but that's besides the point now), systemd tries to restart it and says the transaction is destructive and it leaves the service in failed state.

Systemctl:

● doa3_test.service - Test

Loaded: loaded (/run/fleet/units/doa3_test.service; linked-runtime)

Active: failed (Result: resources) since Wed 2014-12-10 11:02:08 UTC; 1h 38min ago

Process: 18838 ExecStop=/usr/bin/docker stop test (code=exited, status=0/SUCCESS)

Main PID: 914 (code=exited, status=143)

Dec 10 11:01:58 xxx systemd[1]: Unit doa3_test.service entered failed state.

Dec 10 11:02:08 xxx systemd[1]: doa3_test.service holdoff time over, scheduling restart.

Dec 10 11:02:08 xxx systemd[1]: doa3_test.service failed to schedule restart job: Transaction is destructive.

Dec 10 11:02:08 xxx systemd[1]: Unit doa3_test.service entered failed state.

Anybody any idea?

Gerard Meijer

unread,

Dec 10, 2014, 8:43:04 AM12/10/14

to coreo...@googlegroups.com

And to add:

sudo systemctl start doa3_test

starts up the service fine again, without errors.

Gerard Meijer

unread,

Dec 11, 2014, 10:39:03 AM12/11/14

to coreo...@googlegroups.com

I found out a little more. I keep having this problem. Today I experienced the same thing again. I checked the fleet log and found this at the same time the main unit gets stopped and fails to start:

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: ERROR engine.go:135: Unable to determine current lessee: timeout reached

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: WARN job.go:253: No Unit found in Registry for Job(doa3_test_restart_watcher.service)

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: ERROR job.go:95: Failed to parse Unit from etcd: unable to parse Unit in Registry at key

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO client.go:278: Failed getting response from https://[etcd-server]/:

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:89: Triggered systemd unit doa3_test_restart_watcher.service stop: job=5

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:231: Removing systemd unit doa3_test_restart_watcher.service

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO manager.go:142: Instructing systemd to reload units

Dec 11 14:02:17 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=UnloadUnit job=doa3_test_res

Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:218: Writing systemd unit doa3_test_restart_watcher.service (996b)

Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:142: Instructing systemd to reload units

Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=LoadUnit job=doa3_test_resta

Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO manager.go:78: Triggered systemd unit doa3_test_restart_watcher.service start: job=

Dec 11 14:02:20 doa3wrkprd001 fleetd[605]: INFO reconcile.go:274: AgentReconciler completed task: type=StartUnit job=doa3_test_rest

The doa3_test_restart_watcher is a sidekick service to doa3_test. doa3_test has a line "Wants=doa3_test_restart_watcher.service" and doa3_test_restart_watcher has a line "BindsTo=doa3.test.service".

So it seems like fleet itself is the cause of the unability for systemd to restart the job, since this specific unit stops the doa3_test service when it stops itself and then doa3_test gets started again by systemd, but cannot fulfill the "Wants=doa3_test_restart_watcher.service" line, since this unit was removed (according to the logs of fleet).

So now the question is, what do the lines from 14:02:17 mean exactly and how can we make sure they don't happen?

Ross Kukulinski

unread,

Dec 11, 2014, 2:55:23 PM12/11/14

to coreo...@googlegroups.com

Hi Gerard,

I can't help with the fleetd warnings/errors, but I was wondering why you are using 'Wants=doa3_test_restart_watcher.service' instead of 'Requires=doa3_test_restart_watcher.service'?

With Requires=, all listed required dependancies are started when this unit is activated. Watcher does not start the listed deps.

See: http://www.freedesktop.org/software/systemd/man/systemd.unit.html#%5BUnit%5D%20Section%20Options

Best regards,

Ross

_____

Ross Kukulinski

Yodlr Co-Founder / CEO

https://getyodlr.com

Reply all

Reply to author

Forward