etcd-member.service will not start with the ETCD_INITIAL_CLUSTER environment variable specified.

Torin Woltjer

unread,

Jul 7, 2017, 9:04:22 AM7/7/17

to CoreOS User

I have been for a few days trying to work my way around an issue getting a small cluster up and running on coreos. In the cloud config we specify VLANs and routing tables as systemd services and everything seems to work correctly on that front, every machine can ping every other machine on every interface including WAN. However the cloud config also creates a systemd snap in that provides all of the environment variables for the etcd-member service, which will not start unless one line of the snap in is removed or commented out, . I have double and triple checked the usage and and correct syntax of the command and nothing seems to work for me.

  - path: etc/systemd/system/etcd-member.service.d/90-etcd-cluster.conf
    permissions: 0644
    owner: root
    content: |
      [Service]
      Environment="ETCD_NAME=compute-001-n002"
      Environment="ETCD_INITIAL_CLUSTER_TOKEN=etcd-compute-001"
      Environment="ETCD_ADVERTISE_CLIENT_URLS=http://192.168.116.22:2379"
      Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.116.22:2380"
      Environment="ETCD_LISTEN_CLIENT_URLS=http://192.168.116.22:2379,http://127.0.0.1:2379"
      Environment="ETCD_LISTEN_PEER_URLS=http://192.168.116.22:2380"
      Environment="ETCD_INITIAL_CLUSTER=compute-001-n001=http://192.168.116.21:2380,compute-001-n002=http://192.168.116.22:2380,compute-001-n003=http://192.168.116.23:2380,compute-001-n004=http://192.168.116.24:2380"
      Environment="ETCD_INITIAL_CLUSTER_STATE=new"

This is part of the cloud config on node 2, all other nodes are essentially identical, and all have this same problem. When I comment out the line highlighted in red, the etcd-member service starts and runs, the etcdctl commands work, but the nodes cannot talk to eachother even if I add them with etcdctl. With the line uncommented in the cloud-config the etcd-member service does not start and systemctl status etcd-member returns "Active: activating" instead of "Active: active (running)". Without the service active any use of etcdctl returns an error, etcdctl cluster-health for example:

$ etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout

I'm definitely missing a big piece of the puzzle and wondered if anybody here knows what it is. Thanks for any help.

Rob Szumski

unread,

Jul 7, 2017, 7:49:37 PM7/7/17

to Torin Woltjer, CoreOS User

Is there anything in the output from `journalctl -u etcd-member.service`? etcd is pretty good about doing a sanity check on env vars and other stuff before starting up. All of that should be logged.

- Rob

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Torin Woltjer

unread,

Jul 10, 2017, 8:48:15 AM7/10/17

to CoreOS User, macwol...@gmail.com

Jul 02 14:45:57 compute-001-n001 etcd-wrapper[1079]: 2017-07-02 14:45:57.397770 W | etcdserver: failed to reach the peerURL(http://192.168.116.22:2380) of member bc2142d151a0641b (Get http://192.168.116.22:2380/version: dial tcp 192.168.116.22:2380: getsockopt: no route to host)
Jul 02 14:45:57 compute-001-n001 etcd-wrapper[1079]: 2017-07-02 14:45:57.397812 W | etcdserver: cannot get the version of member bc2142d151a0641b (Get http://192.168.116.22:2380/version: dial tcp 192.168.116.22:2380: getsockopt: no route to host)
Jul 02 14:46:01 compute-001-n001 etcd-wrapper[1079]: 2017-07-02 14:46:01.648028 W | rafthttp: health check for peer bc2142d151a0641b could not connect: dial tcp 192.168.116.22:2380: getsockopt: no route to host
Jul 02 14:46:02 compute-001-n001 etcd-wrapper[1079]: 2017-07-02 14:46:02.598996 W | etcdserver: failed to reach the peerURL(http://192.168.116.22:2380) of member bc2142d151a0641b (Get http://192.168.116.22:2380/version: dial tcp 192.168.116.22:2380: i/o timeout)
Jul 02 14:46:02 compute-001-n001 etcd-wrapper[1079]: 2017-07-02 14:46:02.599036 W | etcdserver: cannot get the version of member bc2142d151a0641b (Get http://192.168.116.22:2380/version: dial tcp 192.168.116.22:2380: i/o timeout)

This definitely gives me something to look into, they wont route to each-other yet each node can ping every other node.

Torin Woltjer

unread,

Jul 10, 2017, 9:00:38 AM7/10/17

to CoreOS User

Actually I checked all other nodes and every node other than node 1 gets a

Jul 03 20:52:08 compute-001-n004 systemd[1]: etcd-member.service: Main process exited, code=exited, status=1/FAILURE
Jul 03 20:52:08 compute-001-n004 systemd[1]: Failed to start etcd (System Application Container).
Jul 03 20:52:08 compute-001-n004 systemd[1]: etcd-member.service: Unit entered failed state.
Jul 03 20:52:08 compute-001-n004 systemd[1]: etcd-member.service: Failed with result 'exit-code'.
Jul 03 20:52:18 compute-001-n004 systemd[1]: etcd-member.service: Service hold-off time over, scheduling restart.
Jul 03 20:52:18 compute-001-n004 systemd[1]: Stopped etcd (System Application Container).

That might explain a lot. I'll have to figure out whats present on node1 one but not on the others.

Aneeshwara Babu

unread,

Jun 24, 2019, 10:39:59 AM6/24/19

to CoreOS User

Hi we are trying to create etcd cluster between three etcd machines the below error we are getting and etcd cluster is not initializing

Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.7.28:2379: getsockopt: connection refused error #0: dial tcp 192.168.7.28:2379: getsockopt: connection refused

Reply all

Reply to author

Forward