v2.0 rc1 rejoining cluster issue

155 views
Skip to first unread message

Max Levine

unread,
Dec 23, 2014, 12:53:09 PM12/23/14
to etcd...@googlegroups.com
Hi All,

I am using etcd v2.0 rc1. I had a 3 node cluster running, but at some point the process on one of the nodes died due to running out of file descriptors. I am now trying to rejoin the cluster on that node, however when I start the process I see the following. any idea what I am doing wrong? why is the new node rejecting messages from the leader? I am using the same start command I used to bootstrap with static configuration.


failed node:
2014/12/24 01:39:56 etcdserver: loaded cluster information from store: apac=http://X.Y.Z:7001,emea=http://X.Y.Z:7001,na=http://X.Y.Z:7001
2014/12/24 01:39:57 etcdserver: restart member abb83767df531b32 in cluster e869dbcebeb3cd2c at commit index 320918
2014/12/24 01:39:57 raft: abb83767df531b32 became follower at term 410
2014/12/24 01:39:57 raft: newRaft abb83767df531b32 [peers: [a7e5adf6a57fa85,13bafdbc1b7bcb40,abb83767df531b32], term: 410, commit: 320918, lastindex: 321061, lastterm: 12]
2014/12/24 01:39:57 raft.node: abb83767df531b32 elected leader a7e5adf6a57fa85 at term 410
2014/12/24 01:39:58 raft: abb83767df531b32 [logterm: 0, index: 332553] rejected msgApp [logterm: 393, index: 332553] from a7e5adf6a57fa85
2014/12/24 01:39:59 raft: abb83767df531b32 [logterm: 0, index: 332553] rejected msgApp [logterm: 393, index: 332553] from a7e5adf6a57fa85
.... (these rejected messages just keep going)




on the etcd leader:
2014/12/23 17:39:57 sender: the connection with abb83767df531b32 becomes active
2014/12/23 17:39:58 raft: a7e5adf6a57fa85 received msgApp rejection from abb83767df531b32 for index 332553
2014/12/23 17:39:59 raft: a7e5adf6a57fa85 received msgApp rejection from abb83767df531b32 for index 332553
2014/12/23 17:39:59 raft: a7e5adf6a57fa85 received msgApp rejection from abb83767df531b32 for index 332553
.... (these rejected messages just keep going)



Xiang Li

unread,
Dec 23, 2014, 1:15:27 PM12/23/14
to Max Levine, etcd...@googlegroups.com
Hi Max,

It looks like member `abb83767df531b32` lost some on-disk entries. So it cannot successfully sync with current master.

There might be a bug in our wal pkg: etcd does not load all the entries from disk. 
There might be a bug that etcd does not store the last few entries to disk. 
Or the OS might not write the data to disk even if we called fsync in etcd. (Can you confirm this?)

We will look into this problem soon. You can solve the current issue by:
1. remove the member(and data-dir) and re-add it back by dynamic reconfiguration.
or 2. kill the current leader and restart it.

Thanks,
Xiang

--
You received this message because you are subscribed to the Google Groups "etcd-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Max Levine

unread,
Dec 23, 2014, 3:43:18 PM12/23/14
to etcd...@googlegroups.com, max...@gmail.com, etcd...@googlegroups.com

Hi,

It looks like deleting member and data-dir does not help.:

2014/12/24 04:40:24 etcdserver: start member abb83767df531b32 in cluster e869dbcebeb3cd2c
2014/12/24 04:40:24 raft: abb83767df531b32 became follower at term 0
2014/12/24 04:40:24 raft: newRaft abb83767df531b32 [peers: [], term: 0, commit: 0, lastindex: 0, lastterm: 0]
2014/12/24 04:40:24 raft: abb83767df531b32 became follower at term 1
2014/12/24 04:40:24 etcdserver: added member a7e5adf6a57fa85 [http://X:7001] to cluster e869dbcebeb3cd2c
2014/12/24 04:40:24 etcdserver: added member 13bafdbc1b7bcb40 [http://X:7001] to cluster e869dbcebeb3cd2c
2014/12/24 04:40:24 etcdserver: added local member abb83767df531b32 [http://1X:7001] to cluster e869dbcebeb3cd2c
2014/12/24 04:40:25 raft: abb83767df531b32 is starting a new election at term 1
2014/12/24 04:40:25 raft: abb83767df531b32 became candidate at term 2
2014/12/24 04:40:25 raft: abb83767df531b32 received vote from abb83767df531b32 at term 2
2014/12/24 04:40:25 raft: abb83767df531b32 [logterm: 1, index: 3] sent vote request to a7e5adf6a57fa85 at term 2
2014/12/24 04:40:25 raft: abb83767df531b32 [logterm: 1, index: 3] sent vote request to 13bafdbc1b7bcb40 at term 2
2014/12/24 04:40:26 rafthttp: this member has been permanently removed from the cluster
2014/12/24 04:40:26 rafthttp: the data-dir used by this member must be removed so that this host can be re-added with a new member ID
2014/12/24 04:40:26 rafthttp: this member has been permanently removed from the cluster
2014/12/24 04:40:26 rafthttp: the data-dir used by this member must be removed so that this host can be re-added with a new member ID

Xiang Li

unread,
Dec 23, 2014, 3:53:58 PM12/23/14
to Max Levine, etcd...@googlegroups.com
Hi Max,

After deleting this member, you need to follow the doc to re-add it via reconfiguration API.

Thanks,
Xiang 

Max Levine

unread,
Dec 23, 2014, 4:26:36 PM12/23/14
to etcd...@googlegroups.com, max...@gmail.com, etcd...@googlegroups.com
Hi,

yes I did that and removed the data-dir as well.  here is what /v2/members shows, but the failed node still picks up the old id somehow

{
  "members": [
    {
      "id": "a7e5adf6a57fa85",
      "name": "emea",
      "peerURLs": [
        "http://X.Y.Z:7001"
      ],
      "clientURLs": [
        "http://X.Y.Z:4001"
      ]
    },
    {
      "id": "13bafdbc1b7bcb40",
      "name": "na",
      "peerURLs": [
        "http://X.Y.Z:7001"
      ],
      "clientURLs": [
        "http://X.Y.Z:4001"
      ]
    },
    {
      "id": "59789ab5bb7fbe18",
      "name": "",
      "peerURLs": [
        "http://X.Y.Z:7001"
      ],
      "clientURLs": []
    }
  ]

Steven Schlansker

unread,
Dec 23, 2014, 4:29:20 PM12/23/14
to Max Levine, etcd...@googlegroups.com
I can confirm the same problem - I remove a member via the /v2/members API and delete its data-dir, but somehow it still gets the old ID and is refused rejoining the cluster.

Xiang Li

unread,
Dec 23, 2014, 4:29:43 PM12/23/14
to Max Levine, etcd...@googlegroups.com
After you added the new member to the cluster, you should not use the previous command line args and need to set the cluster-state = existing. 


```

Now start the new etcd process with the relevant flags for the new member:

$ export ETCD_NAME="infra3"
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=existing
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379  -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380
```

Thanks!

Xiang Li

unread,
Dec 23, 2014, 4:30:50 PM12/23/14
to Steven Schlansker, Max Levine, etcd...@googlegroups.com
See my previous reply.

Max Levine

unread,
Dec 23, 2014, 4:39:09 PM12/23/14
to etcd...@googlegroups.com, max...@gmail.com
OK. that worked to pick up the new ID, but still getting some errors:. see below. Also I am wondering how different command line with "-initial-cluster-state=existing" works if I want to automate joining the cluster after a reboot?

2014/12/24 05:36:35 etcdserver: start member b35f12b3bd3a8979 in cluster e869dbcebeb3cd2c
2014/12/24 05:36:35 raft: b35f12b3bd3a8979 became follower at term 0
2014/12/24 05:36:35 raft: newRaft b35f12b3bd3a8979 [peers: [], term: 0, commit: 0, lastindex: 0, lastterm: 0]
2014/12/24 05:36:35 raft: b35f12b3bd3a8979 became follower at term 1
2014/12/24 05:36:35 raft: b35f12b3bd3a8979 [term: 1] received a MsgHeartbeat message with higher term from a7e5adf6a57fa85 [term: 410]
2014/12/24 05:36:35 raft: b35f12b3bd3a8979 became follower at term 410
2014/12/24 05:36:35 raft.node: b35f12b3bd3a8979 elected leader a7e5adf6a57fa85 at term 410
2014/12/24 05:36:36 raft: b35f12b3bd3a8979 [logterm: 0, index: 364646] rejected msgApp [logterm: 410, index: 364646] from a7e5adf6a57fa85
2014/12/24 05:36:36 raft: b35f12b3bd3a8979 [logterm: 0, index: 364646] rejected msgApp [logterm: 410, index: 364646] from a7e5adf6a57fa85
2014/12/24 05:36:36 raft: b35f12b3bd3a8979 [logterm: 0, index: 364646] rejected msgApp [logterm: 410, index: 364646] from a7e5adf6a57fa85
2014/12/24 05:36:36 raft: b35f12b3bd3a8979 [logterm: 0, index: 364646] rejected msgApp [logterm: 410, index: 364646] from a7e5adf6a57fa85
2014/12/24 05:36:36 raft: b35f12b3bd3a8979 [logterm: 0, index: 364645] rejected msgApp [logterm: 410, index: 364645] from a7e5adf6a57fa85

Xiang Li

unread,
Dec 23, 2014, 4:55:39 PM12/23/14
to Max Levine, etcd...@googlegroups.com
Hi Max,

Thanks for reporting I will try to reproduce the issue and try to fix it. 


Also I am wondering how different command line with "-initial-cluster-state=existing" works if I want to automate joining the cluster after a reboot?
1. How often do you need to remove/re-add a member? When do you want to do this?
2. etcd will be serving the source of truth of the entire cluster. Any etcd related critical changes should be done manually and carefully in normal case. We intentionally separate the reconfiguration process into several stages and ask for human involvement.  

Thanks,
Xiang

Max Levine

unread,
Dec 23, 2014, 5:51:03 PM12/23/14
to Xiang Li, etcd...@googlegroups.com
I was thinking to put etcd under something like upstart, so that if process dies or the node is rebooted it will be respawned. 
I dont think there should be any need for human intervention in case of a node reboot... why can't the rebooted node automatically rejoin the cluster?

Xiang Li

unread,
Dec 23, 2014, 6:07:24 PM12/23/14
to Max Levine, etcd...@googlegroups.com
Hi Max,

That is different. Reboot is OK. A rebooted member do not need to *join*, it only need to catch up with the cluster.
You only need to change the cmd line args if the configuration is changed. 

Thanks,
Xiang
Reply all
Reply to author
Forward
0 new messages