fleet with [service] Restart=always not restarting

Diego Medina

unread,

Sep 30, 2014, 10:56:50 AM9/30/14

to coreo...@googlegroups.com

Hi,

I got my service running (a simple Go rest service) running inside a container running on CoreOS, I'd like the app/service to restart inside the container if there is some kind of program failure, so I use the Restart=always line in the unit file, but the service stays in Failure status.
Some details:

$ fleetctl cat app@8080
[Unit]
Description=Go Application that talks to etcd
After=etcd.service
After=docker.service
Requires=app-discovery@%i.service

[Service]
TimeoutStartSec=0
KillMode=none
Restart=always
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill app%i
ExecStartPre=-/usr/bin/docker rm app%i
ExecStartPre=/usr/bin/docker pull fmpwizard/coreosdemo
ExecStart=/usr/bin/docker run --name app%i -p ${COREOS_PUBLIC_IPV4}:%i:8080 fmpwizard/coreosdemo
ExecStop=/usr/bin/docker stop app%i

[X-Fleet]
X-Conflicts=app@*.service

==============================

Current failure status:

$ fleetctl status app@8080
● a...@8080.service - Go Application that talks to etcd
   Loaded: loaded (/run/fleet/units/a...@8080.service; linked-runtime)
   Active: failed (Result: exit-code) since Tue 2014-09-30 14:33:19 UTC; 14min ago
Process: 7154 ExecStop=/usr/bin/docker stop app%i (code=exited, status=0/SUCCESS)
Process: 776 ExecStart=/usr/bin/docker run --name app%i -p ${COREOS_PUBLIC_IPV4}:%i:8080 fmpwizard/coreosdemo (code=exited, status=1/FAILURE)
Process: 764 ExecStartPre=/usr/bin/docker pull fmpwizard/coreosdemo (code=exited, status=0/SUCCESS)
Process: 753 ExecStartPre=/usr/bin/docker rm app%i (code=exited, status=1/FAILURE)
Process: 686 ExecStartPre=/usr/bin/docker kill app%i (code=exited, status=1/FAILURE)
Main PID: 776 (code=exited, status=1/FAILURE)

Sep 30 14:03:57 two docker[776]: 2014/09/30 14:03:57 here ut comes value=3969970466925200696
Sep 30 14:03:57 two docker[776]: 2014/09/30 14:03:57 here ut comes value=3349133614510369083
Sep 30 14:03:57 two docker[776]: 2014/09/30 14:03:57 here ut comes value=8554272536372678354
Sep 30 14:03:57 two docker[776]: 2014/09/30 14:03:57 here ut comes value=5689556326700383242
Sep 30 14:33:19 two docker[776]: 2014/09/30 14:33:19 Good bye!
Sep 30 14:33:19 two systemd[1]: a...@8080.service: main process exited, code=exited, status=1/FAILURE
Sep 30 14:33:19 two systemd[1]: Stopping Go Application that talks to etcd...
Sep 30 14:33:19 two docker[7154]: app8080
Sep 30 14:33:19 two systemd[1]: Stopped Go Application that talks to etcd.
Sep 30 14:33:19 two systemd[1]: Unit a...@8080.service entered failed state.

=============================

fleetctl list-units
UNIT               MACHINE                  ACTIVE        SUB
app-di...@8080.service    6332e5a9.../<ip here>    inactive        dead
a...@8080.service          6332e5a9.../<ip here>    failed             failed

============================

I force the app to crash by calling (in Go code)

log.Fatal("Good bye!")

I can't tell what I'm doing wrong here.

Thanks and let me know if I should be providing more information.

Diego

Rob Szumski

unread,

Sep 30, 2014, 12:59:05 PM9/30/14

to Diego Medina, coreo...@googlegroups.com

From the systemd docs, the killmode behavior might be affecting this (just a guess):

"If set to none, no process is killed. In this case, only the stop command will be executed on unit stop, but no process be killed otherwise"

Maybe that doesn’t trigger the restart condition?

On Sep 30, 2014, at 7:56 AM, Diego Medina <fmpw...@gmail.com> wrote:

KillMode=none

Diego Medina

unread,

Sep 30, 2014, 1:03:03 PM9/30/14

to Rob Szumski, coreo...@googlegroups.com

I saw that but didn't think it was related, let me try just in case and I'll post back. Thanks

--

Diego Medina
Lift/Scala consultant
di...@fmpwizard.com
http://fmpwizard.telegr.am

Diego Medina

unread,

Sep 30, 2014, 11:39:38 PM9/30/14

to Rob Szumski, coreo...@googlegroups.com

hm, this didn't make any difference, I force crash my app, but it was not restarted.

I was reading more on the Coreos site and found this page:

https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/

which hints that I should put my .service file in /etc/systemd/system/

my coreos box doesn't have any file there, because the way I loaded the service file ( app@.service) was using fleetctl

taken from a digitalocean wiki page, the steps I took were pretty much:

fleetctl submit app@.service app-discovery@.service

# Load the new files
fleetctl load a...@8080.service
fleetctl load app-di...@8080.service

# Start the main service
fleetctl start a...@8080.service

maybe this is my problem, and I'm somehow skipping a step?

also, the docker container that runs my sample app terminates when my app crashes, I thought fleet was supposed to restart it, somehow :)

Thanks and I'll keep poking around to see if I figure out what I'm missing this time.

Diego

Jonathan Boulle

unread,

Oct 2, 2014, 9:28:59 PM10/2/14

to Diego Medina, Rob Szumski, coreos-user

On Tue, Sep 30, 2014 at 8:39 PM, Diego Medina <di...@fmpwizard.com> wrote:

also, the docker container that runs my sample app terminates when my app crashes, I thought fleet was supposed to restart it, somehow :)

If an application terminate abnormally, fleet (like systemd) will not restart it by default - definitely use the `Restart` policy for that. If the machine goes down on which a service is running, however, fleet will reschedule that service elsewhere.

Did you have any luck getting this sorted? If not, could you please file an issue in the fleet project?

Diego Medina

unread,

Oct 2, 2014, 9:45:50 PM10/2/14

to Jonathan Boulle, Rob Szumski, coreos-user

Hi Jonathan,

I was able to get fleet to reschedule the service into another coreos server when I shutdown the complete server, so moving from server A to server B worked. But restarting the container when the running service failed did not work.

I will try this one more time on a clean new set of servers and if that fails again, I'll go ahead and file a ticket.