Add member to etcd

William Richards

unread,

Dec 3, 2015, 12:21:11 PM12/3/15

to Deis user list

Hi there,

I just deployed a brand new Deis 1.12.2 stateless cluster on AWS yesterday. After everything was set up we tested terminating an instance. The autoscaling group ensured that a new instance was created, as expected. The new instance joined the cluster and was visible with "fleetctl list-machines".

However, when executing a "etcdctl member list" command, the old terminated instance was listed. We did some research and it appears this is normal (to recover from network partitions). The terminated instance will never return, so we did an "etcdctl member remove {member-id}" and the member was removed as expected.

At this point we are having difficulty. How do I re-add the new member into the etcd cluster? I've read lots of documentation on this, and tried quite a few things, but nothing I've done has worked. Is there a specific procedure that anyone knows about that definitely works with how Deis uses etcd?

Any help you could provide would be very appreciated. :-)

Thanks!

Cheers,

William

Chris Armstrong

unread,

Dec 3, 2015, 12:50:14 PM12/3/15

to William Richards, Deis user list

Hi William,

I'm not sure what you mean by "re-add the new member". The new host that was added by the autoscaling event should have joined the cluster and become healthy. Removing the stale host was safe.

So, you now have the same number of members as you had previously, with one removed and one added. Can you explain a bit what you're trying to achieve?

Chris

--
You received this message because you are subscribed to the Google Groups "Deis user list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deis-users+...@googlegroups.com.
To post to this group, send email to deis-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deis-users/9731453d-8ef1-4421-a92b-de8704558548%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Chris Armstrong

VP Engineering and Core Maintainer

Deis - Your PaaS. Your rules.

Helm - The package manager for Kubernetes

William Richards

unread,

Dec 3, 2015, 1:20:35 PM12/3/15

to Chris Armstrong, Deis user list

Sorry, I think my language was confusing. The new host is healthy in every way, except it is not a member listed in 'etcd member list'. After removing the instance I terminated, I now only have 4 members listed in my 5 node cluster. Perhaps this will help explain:

core@ip-10-0-31-129 ~ $ fleetctl list-machines
MACHINE IP METADATA
10473578... 10.0.32.175 controlPlane=true,dataPlane=true,routerMesh=true
19d3db39... 10.0.32.176 controlPlane=true,dataPlane=true,routerMesh=true
a3d1e7df... 10.0.31.129 controlPlane=true,dataPlane=true,routerMesh=true
b5e5c904... 10.0.30.28 controlPlane=true,dataPlane=true,routerMesh=true
e360568f... 10.0.32.177 controlPlane=true,dataPlane=true,routerMesh=true

core@ip-10-0-31-129 ~ $ etcdctl member list
4a319eb002b016d6: name=e360568ff7be4fa1a4f7a59200c8bfb8 peerURLs=http://10.0.32.177:2380 clientURLs=http://10.0.32.177:2379
6980bf7cea9a1a6c: name=b5e5c904e84b491e8b8b9df756b3003c peerURLs=http://10.0.30.28:2380 clientURLs=http://10.0.30.28:2379
ce0e8aba70447086: name=1047357805324721b92dd637e079786c peerURLs=http://10.0.32.175:2380 clientURLs=http://10.0.32.175:2379
ee5439ee0ca9c962: name=19d3db396ea840a0949d103d66b4a933 peerURLs=http://10.0.32.176:2380 clientURLs=http://10.0.32.176:2379

Note that the host 10.0.31.129 is listed as a member in fleet, but not etcd. That is the new instance spun up by the Autoscaling Group after terminating one of the original members.

I am trying to get 10.0.31.129 to be an etcd member.

Did that make sense?

WR

William Richards

unread,

Dec 4, 2015, 4:59:34 PM12/4/15

to Deis user list, carms...@engineyard.com

I've figured out how to promote an etcd2 proxy to a full member on CoreOS (Deis 1.12.2 cluster). I will document the steps here for anyone who finds this page and has the same question:

Find the machine id by issuing "cat /etc/machine-id". This will be the name of this node in etcd2.
On the node you wish to add, issue the following command: "etcdctl member add {machine-id} http://{node ip}:2380"
Take the output of that command, and using the documentation from the link below, create a file called /run/systemd/system/etcd2.service.d/99-restore.conf. Copy the example from from the link below and update it using the output of the "etcdctl member add" command you issued in the previous step.
Tell systemctl to reload it's files: sudo systemctl daemon-reload
Tell etcd2 to restart: sudo systemctl restart etcd2
You can follow the etcd2 logs to see what it's doing with this command: journalctl -f -u etcd2
You can view the health of the cluster with: etcdctl cluster-health
When etcd2 has joined the existing cluster as a full member, remove the 99-restore.conf file with this command: sudo rm /run/systemd/system/etcd2.service.d/99-restore.conf
Tell systemctl to reload it's files: sudo systemctl daemon-reload

Your etcd2 proxy node should now be a full member.

Cheers,

William

These instructions were mostly figured out from this page: https://coreos.com/os/docs/latest/using-environment-variables-in-systemd-units.html#etcd2.service-unit-advanced-example

To unsubscribe from this group and stop receiving emails from it, send an email to deis-users+unsubscribe@googlegroups.com.

To post to this group, send email to deis-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deis-users/9731453d-8ef1-4421-a92b-de8704558548%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Rardon

unread,

May 18, 2016, 2:37:57 PM5/18/16

to Deis user list, carms...@engineyard.com

Just wanted to say thanks for this. Also the link has been updated to: https://coreos.com/etcd/docs/latest/etcd-live-cluster-reconfiguration.html (about a third of the way down the page)

I also had to use a slightly different ExecStartPre to point to the proper proxy folder "ExecStartPre=/usr/bin/rm -rf /media/etcd/proxy"

Here is the contents of the file for posterity:

[Service]
# remove previously created proxy directory
ExecStartPre=/usr/bin/rm -rf /var/lib/etcd2/proxy
# NOTE: use this option if you would like to re-add broken etcd member into cluster
# Don't forget to make a backup before
#ExecStartPre=/usr/bin/rm -rf /var/lib/etcd2/member /var/lib/etcd2/proxy
# here we clean previously defined ETCD_DISCOVERY environment variable, we don't need it as we've already bootstrapped etcd cluster and ETCD_DISCOVERY conflicts with ETCD_INITIAL_CLUSTER environment variable
Environment="ETCD_DISCOVERY="
Environment="ETCD_NAME=node4"
# We use ETCD_INITIAL_CLUSTER variable value from previous step ("etcdctl member add" output)
Environment="ETCD_INITIAL_CLUSTER=node1=http://10.0.1.1:2380,node2=http://10.0.1.2:2380,node3=http://10.0.1.3:2380,node4=http://10.0.1.4:2380"
Environment="ETCD_INITIAL_CLUSTER_STATE=existing"

To unsubscribe from this group and stop receiving emails from it, send an email to deis-users+...@googlegroups.com.

To post to this group, send email to deis-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deis-users/9731453d-8ef1-4421-a92b-de8704558548%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward