ExecStartPre is happening before ExecStart (ExecStartPre in dropin)

Greg Fausak

unread,

Mar 9, 2015, 10:15:36 AM3/9/15

to coreo...@googlegroups.com

I have been starting datadog on a 3 node coreos fleet, my service file is:

[Unit]
Description=Monitoring Service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill dd-agent
ExecStartPre=-/usr/bin/docker rm dd-agent
ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent
ExecStart=/usr/bin/bash -c \
"/usr/bin/docker run --privileged --name dd-agent -h `hostname` \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /proc/mounts:/host/proc/mounts:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e API_KEY=`etcdctl get /ddapikey` \
datadog/docker-dd-agent"
[X-Fleet]
Global=true

Note the API_KEY=`etcdctl get /ddapikey` section. In my cloud-config I create a dropin for this service:

- name: datadog.service
  drop-ins:
  - name: 50-ddapikey-config.conf
  content: |
  [Service]

ExecStartPre=/usr/bin/etcdctl set /ddapikey MYKEY

Sometimes it starts on all three nodes. But, usually, it fails on at least one node. Here is the log on the one it failed on:

core@tt0 ~ $ systemctl status -l datadog
● datadog.service - Monitoring Service
Loaded: loaded (/etc/systemd/system/datadog.service; linked-runtime; vendor preset: disabled)
  Drop-In: /etc/systemd/system/datadog.service.d
└─50-ddapikey-config.conf
Active: failed (Result: exit-code) since Mon 2015-03-09 13:31:52 UTC; 10min ago
Process: 1800 ExecStart=/usr/bin/bash -c /usr/bin/docker run --privileged --name dd-agent -h `hostname` -v /var/run/docker.sock:/var/run/docker.sock -v /proc/mounts:/host/proc/mounts:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=`etcdctl get /ddapikey` datadog/docker-dd-agent (code=exited, status=1/FAILURE)
  Process: 1793 ExecStartPre=/usr/bin/etcdctl set /ddapikey MYKEY (code=exited, status=0/SUCCESS)
  Process: 1683 ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent (code=exited, status=0/SUCCESS)
Process: 1673 ExecStartPre=/usr/bin/docker rm dd-agent (code=exited, status=1/FAILURE)
Process: 1665 ExecStartPre=/usr/bin/docker kill dd-agent (code=exited, status=1/FAILURE)
Main PID: 1800 (code=exited, status=1/FAILURE)
Mar 09 13:31:51 tt0 docker[1683]: 7bc3216dd09e: Download complete
Mar 09 13:31:51 tt0 docker[1683]: 7bc3216dd09e: Download complete
Mar 09 13:31:51 tt0 docker[1683]: Status: Downloaded newer image for datadog/docker-dd-agent:latest
Mar 09 13:31:51 tt0 etcdctl[1793]: ac578ef8dc567125c0717a4d503c3342
Mar 09 13:31:51 tt0 systemd[1]: Started Monitoring Service.
Mar 09 13:31:52 tt0 bash[1800]: Error: 100: Key not found (/ddapikey) [826]
Mar 09 13:31:52 tt0 bash[1800]: You must set API_KEY environment variable to run the Datadog Agent container
Mar 09 13:31:52 tt0 systemd[1]: datadog.service: main process exited, code=exited, status=1/FAILURE
Mar 09 13:31:52 tt0 systemd[1]: Unit datadog.service entered failed state.
Mar 09 13:31:52 tt0 systemd[1]: datadog.service failed.

So I go to that node and systemctl stop, kill, and start it. Which brings it up.

Obviously the place to put a super secret api key is not in the start scripts for that service. That aside, I made an assumption that ExecStartPre script happens before ExecStart, the indication with the process ID is that the assertion is true. The order of the logs seems to imply that the opposite happens sometimes.

My read from the documentation for systemd is that ExecStartPre are executed before ExecStart, serially. Is this true?

What is the best practice for picking up these sorts of things? Create a service which fetches the information from a secure source, and make this service dependent on that one?

-g

Seán C. McCord

unread,

Mar 9, 2015, 10:26:44 AM3/9/15

to Greg Fausak, coreo...@googlegroups.com

I would guess you are seeing the fact that etcd does not guarantee immediate consistency, but you can check that by calling a synchronous operation, like `touch /tmp/testme` instead of the `etcdctl set`, and see if it works reliably.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seán C. McCord

unread,

Mar 9, 2015, 10:30:18 AM3/9/15

to Greg Fausak, coreo...@googlegroups.com

I should say that _by_ _default_, immediate consistency is not guaranteed.

To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user+unsubscribe@googlegroups.com.

Greg Fausak

unread,

Mar 9, 2015, 1:28:25 PM3/9/15

to coreo...@googlegroups.com, lgfa...@gmail.com

I took your advice, tried writing to a file and then reading it later. That worked, similar to:

echo test > /tmp/myfile

cat /tmp/myfile

However, this sequence:

etcdctl set /myfile test

etcdctl get /myfile

has the possibility of NOT returning the expected result.

That's *real bad*, isn't it?

I read through a bunch of conversation about this. There seems to be something about adding quorum=true or consistent=yes to a curl query (instead of etcdctl get). Curl is too ugly for me to use, you have to parse the json and it makes the Exec lines ugly. You mentioned something about 'default'. Can I make etcd run 'consistently' by default? I really don't understand why you wouldn't want it to run that way!

Thanks for the enlightenment. I am really bummed about this discovery :-(

-g

To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.

Seán C. McCord

unread,

Mar 9, 2015, 2:02:28 PM3/9/15

to Greg Fausak, coreo...@googlegroups.com

In 0.4.x, there is the command-line option for etcdctl get, `--consistent`, which will do what you want: `etcdctl get --consistent /myfile`.

I'm not using 2.x yet, but from a cursory glance, it looks like it may be consistent by default.

Greg Fausak

unread,

Mar 9, 2015, 2:17:47 PM3/9/15

to coreo...@googlegroups.com, lgfa...@gmail.com

Thanks!

I was trying this:

core@www0 ~ $ etcdctl --help
NAME:
etcdctl - A simple command line client for etcd.

USAGE:
etcdctl [global options] command [command options] [arguments...]

VERSION:
0.4.6

COMMANDS:
mk make a new key with a given value
mkdir make a new directory
rm remove a key
rmdir removes the key if it is an empty directory or a key-value pair
get retrieve the value of a key
ls retrieve a directory
set set the value of a key
setdir create a new or existing directory
update update an existing key with a given value
updatedir update an existing directory
watch watch a key for changes
exec-watch watch a key for changes and exec an executable
help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
--debug output cURL commands which can be used to reproduce the request
--no-sync don't synchronize cluster information before sending request
--output, -o 'simple' output response in the given format (`simple` or `json`)
--peers, -C '--peers option --peers option' a comma-delimited list of machine addresses in the cluster (default: "127.0.0.1:4001")
--version, -v print the version

--help, -h show help

What I really wanted was:

core@www0 ~ $ etcdctl get --help
NAME:
get - retrieve the value of a key

USAGE:
command get [command options] [arguments...]

DESCRIPTION:

OPTIONS:
--sort returns result in sorted order

--consistent send request to the leader, thereby guranteeing that any earlier writes will be seen by the read

I still don't get it though. In my service file I was starting up with Global=true, so, I can see where the ddapikey was being written on each node. But, each node had the same value written, so an inconsistent value wouldn't hurt. I was getting no value on my get.