ExecStartPre is happening before ExecStart (ExecStartPre in dropin)

725 views
Skip to first unread message

Greg Fausak

unread,
Mar 9, 2015, 10:15:36 AM3/9/15
to coreo...@googlegroups.com
I have been starting datadog on a 3 node coreos fleet, my service file is:

[Unit]
Description=Monitoring Service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill dd-agent
ExecStartPre=-/usr/bin/docker rm dd-agent
ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent
ExecStart=/usr/bin/bash -c \
"/usr/bin/docker run --privileged --name dd-agent -h `hostname` \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /proc/mounts:/host/proc/mounts:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e API_KEY=`etcdctl get /ddapikey` \
datadog/docker-dd-agent"
[X-Fleet]

Global=true


Note the API_KEY=`etcdctl get /ddapikey` section.  In my cloud-config I create a dropin for this service:

    - name: datadog.service
      drop-ins:
        - name: 50-ddapikey-config.conf
          content: |
            [Service]

            ExecStartPre=/usr/bin/etcdctl set /ddapikey MYKEY

Sometimes it starts on all three nodes.  But, usually, it fails on at least one node.  Here is the log on the one it failed on:

core@tt0 ~ $ systemctl status -l datadog
datadog.service - Monitoring Service
   Loaded: loaded (/etc/systemd/system/datadog.service; linked-runtime; vendor preset: disabled)
  Drop-In: /etc/systemd/system/datadog.service.d
           └─50-ddapikey-config.conf
   Active: failed (Result: exit-code) since Mon 2015-03-09 13:31:52 UTC; 10min ago
  Process: 1800 ExecStart=/usr/bin/bash -c /usr/bin/docker run --privileged --name dd-agent -h `hostname`  -v /var/run/docker.sock:/var/run/docker.sock  -v /proc/mounts:/host/proc/mounts:ro  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro  -e API_KEY=`etcdctl get /ddapikey`  datadog/docker-dd-agent (code=exited, status=1/FAILURE)
  Process: 1793 ExecStartPre=/usr/bin/etcdctl set /ddapikey MYKEY (code=exited, status=0/SUCCESS)
  Process: 1683 ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent (code=exited, status=0/SUCCESS)
  Process: 1673 ExecStartPre=/usr/bin/docker rm dd-agent (code=exited, status=1/FAILURE)
  Process: 1665 ExecStartPre=/usr/bin/docker kill dd-agent (code=exited, status=1/FAILURE)
 Main PID: 1800 (code=exited, status=1/FAILURE)
Mar 09 13:31:51 tt0 docker[1683]: 7bc3216dd09e: Download complete
Mar 09 13:31:51 tt0 docker[1683]: 7bc3216dd09e: Download complete
Mar 09 13:31:51 tt0 docker[1683]: Status: Downloaded newer image for datadog/docker-dd-agent:latest
Mar 09 13:31:51 tt0 etcdctl[1793]: ac578ef8dc567125c0717a4d503c3342
Mar 09 13:31:51 tt0 systemd[1]: Started Monitoring Service.
Mar 09 13:31:52 tt0 bash[1800]: Error:  100: Key not found (/ddapikey) [826]
Mar 09 13:31:52 tt0 bash[1800]: You must set API_KEY environment variable to run the Datadog Agent container
Mar 09 13:31:52 tt0 systemd[1]: datadog.service: main process exited, code=exited, status=1/FAILURE
Mar 09 13:31:52 tt0 systemd[1]: Unit datadog.service entered failed state.

Mar 09 13:31:52 tt0 systemd[1]: datadog.service failed.

So I go to that node and systemctl stop, kill, and start it. Which brings it up.

Obviously the place to put a super secret api key is not in the start scripts for that service.  That aside, I made an assumption that ExecStartPre script happens before ExecStart, the indication with the process ID is that the assertion is true.  The order of the logs seems to imply that the opposite happens sometimes.

My read from the documentation for systemd is that ExecStartPre are executed before ExecStart, serially.  Is this true?

What is the best practice for picking up these sorts of things?  Create a service which fetches the information from a secure source, and make  this service dependent on that one?

-g





Seán C. McCord

unread,
Mar 9, 2015, 10:26:44 AM3/9/15
to Greg Fausak, coreo...@googlegroups.com
I would guess you are seeing the fact that etcd does not guarantee immediate consistency, but you can check that by calling a synchronous operation, like `touch /tmp/testme` instead of the `etcdctl set`, and see if it works reliably.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seán C. McCord

unread,
Mar 9, 2015, 10:30:18 AM3/9/15
to Greg Fausak, coreo...@googlegroups.com
I should say that _by_ _default_, immediate consistency is not guaranteed.

To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user+unsubscribe@googlegroups.com.

Greg Fausak

unread,
Mar 9, 2015, 1:28:25 PM3/9/15
to coreo...@googlegroups.com, lgfa...@gmail.com
I took your advice, tried writing to a file and then reading it later.  That worked, similar to:

echo test > /tmp/myfile
cat /tmp/myfile

However, this sequence:

etcdctl set /myfile test
etcdctl get /myfile

has the possibility of NOT returning the expected result.

That's *real bad*, isn't it?

I read through a bunch of conversation about this.  There seems to be something about adding quorum=true or consistent=yes to a curl query (instead of etcdctl get).  Curl is too ugly for me to use, you have to parse the json and it makes the Exec lines ugly.  You mentioned something about 'default'.  Can I make etcd run 'consistently' by default?  I really don't understand why you wouldn't want it to run that way!

Thanks for the enlightenment.  I am really bummed about this discovery :-(

-g
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.

Seán C. McCord

unread,
Mar 9, 2015, 2:02:28 PM3/9/15
to Greg Fausak, coreo...@googlegroups.com
In 0.4.x, there is the command-line option for etcdctl get, `--consistent`, which will do what you want:  `etcdctl get --consistent /myfile`.

I'm not using 2.x yet, but from a cursory glance, it looks like it may be consistent by default.

Greg Fausak

unread,
Mar 9, 2015, 2:17:47 PM3/9/15
to coreo...@googlegroups.com, lgfa...@gmail.com
Thanks!

I was trying this:

core@www0 ~ $ etcdctl --help
NAME:
   etcdctl - A simple command line client for etcd.

USAGE:
   etcdctl [global options] command [command options] [arguments...]

VERSION:
   0.4.6

COMMANDS:
   mk make a new key with a given value
   mkdir make a new directory
   rm remove a key
   rmdir removes the key if it is an empty directory or a key-value pair
   get retrieve the value of a key
   ls retrieve a directory
   set set the value of a key
   setdir create a new or existing directory
   update update an existing key with a given value
   updatedir update an existing directory
   watch watch a key for changes
   exec-watch watch a key for changes and exec an executable
   help, h Shows a list of commands or help for one command
   
GLOBAL OPTIONS:
   --debug output cURL commands which can be used to reproduce the request
   --no-sync don't synchronize cluster information before sending request
   --output, -o 'simple' output response in the given format (`simple` or `json`)
   --peers, -C '--peers option --peers option' a comma-delimited list of machine addresses in the cluster (default: "127.0.0.1:4001")
   --version, -v print the version

   --help, -h show help


What I really wanted was:

core@www0 ~ $ etcdctl get --help
NAME:
   get - retrieve the value of a key

USAGE:
   command get [command options] [arguments...]

DESCRIPTION:
   

OPTIONS:
   --sort returns result in sorted order

   --consistent send request to the leader, thereby guranteeing that any earlier writes will be seen by the read


I still don't get it though.  In my service file I was starting up with Global=true, so, I can see where the ddapikey was being written on each node.  But, each node had the same value written, so an inconsistent value wouldn't hurt.  I was getting no value on my get.

Anyway, the file version works *every* time :-)  Thanks for you help.

-g
Reply all
Reply to author
Forward
0 new messages