Tempary Split Brain and non-removal of node

Mark Tinsley

unread,

Nov 27, 2013, 4:30:13 AM11/27/13

to elasticsear...@googlegroups.com

Hi all,

I have been investigating using elasticsearch-zookeeper, this is a follow on investigation from a post I made on elasticsearch forum.

So:

Given I have a cluster of three Elasticsearch (ES) nodes called ES1, ES2 and ES3, and a zookeeper cluster called ZC.

Issue 1:

ES1 is the current master.

I drop the connection between ZC and ES1, I accomplish this by running iptables -A INPUT -s <zookeeper cluster ip addresses> -j DROP and iptables -A OUTPUT -s <zookeeper cluster ip addresses> -j DROP on ES1.

ES1 retains the original cluster state, ES2 and ES3 eventually elect a new master. We now have two masters.

The issue is while the connection between ZC and EC1 is down there is a chance of losing data or serving old data. Even after the split brain, the data would be in a inconsistent state due to any updates that happened on both masters.

Issue 2:

ES1 is the current master.

I drop the connection between ES1 and ES2, I accomplish this by running iptables -A INPUT -s <ES2 ip address> -j DROP and iptables -A OUTPUT -s <ES2 ip address> -j DROP on ES1.

ES1, ES2 and ES3 has no cluster change.

I would expect that ES2 be removed from the cluster. Otherwise some updates/inserts/searches of documents would fail.

NOTE: On the above two issues when I reestablish the connection, by dropping the iptables rules running iptables -F, the cluster state is corrected and only one master is elected.

Any help would be greatly appreciated,

Kind regards,

Mark Tinsley

Igor Motov

unread,

Nov 27, 2013, 7:30:56 AM11/27/13

to elasticsear...@googlegroups.com

Issue 1:

Did you actually observe "losing data or serving old data" or this is theoretical discussion? Which client did you use REST or Java? Did REST client have sniff enabled? For how long connection between EC

Igor Motov

unread,

Nov 27, 2013, 7:35:33 AM11/27/13

to elasticsear...@googlegroups.com

Sorry, hit send button too soon.

Issue 1:

Did you actually observe "losing data or serving old data" or this is theoretical discussion? Which client did you use REST or Java? Did REST client have sniff enabled? For how long connection between EC1 and ZK was out?

Issue 2:

Detecting connection loss between two nodes is not supported. All fault detection is done through ZK. Did you observer data loss or inconsistency after restoring connection? If yes, do you have a reproducible test case, which you can share?

Mark Tinsley

unread,

Nov 27, 2013, 8:47:27 AM11/27/13

to elasticsear...@googlegroups.com

Hi Igor,

Issue 1
Losing data is theoretical, I will look into making a reproducible test to prove/disprove this. I have just tried to reproduce losing data, but have been unable to create what I thought would happen. This could just be my basic understanding of how sharding/replication works in elasticsearch, I will take a look and see if I can replicate this.

I used the rest client via curl to ask each node what the cluster state was. The nodes where set up with the following properties:

cluster.name: zookeeper
discovery:
    type: com.sonian.elasticsearch.zookeeper.discovery.ZooKeeperDiscoveryModule
sonian.elasticsearch.zookeeper:
    settings.enabled: true
    client.host: <zcipaddress>:2181,<zcipaddress>:<zcipaddress>:2181
    discovery.state_publishing.enabled: true

Sorry I don’t think I was clear enough on the split brain, this only occurs while the connection is down. (see NOTE).

Issue 2
Again losing data is theoretical, I will look into making a reproducible test to prove/disprove this.

Also Forgot to mention I was using version 0.90.2 of elasticsearch and 0.90.0 of elasticsearch-zookeeper plug-in.

Thanks,

Mark

Mark Tinsley

unread,

Nov 28, 2013, 6:19:03 AM11/28/13

to elasticsear...@googlegroups.com

Hi Igor,

So for issue one to reproduce lost data I do the following:

Given a setup of three nodes ES1, ES2 and ES3 where ES1 is the current master. and a zookeeper cluster ZC.

Add 10 documents into elasticserach. I did this by running: for i in {1..10}; do curl -X POST localhost:9200/test/test/$i -d '{"data":"version1"}'; done

Remove the connection between ES1 and ZC. I have done this by running iptables -A INPUT -s <zookeeper cluster ip addresses> -j DROP and iptables -A OUTPUT -s <zookeeper cluster ip addresses> -j DROP on ES1.

Wait about 5 minutes for the next master to be elected, I have been doing this by querying curl localhost:9200/_cluster/state, on each node. for me ES2 was elected master.

Now on EC2 add a new version of the 10 documents, I ran: for i in {1..10}; do curl -X POST localhost:9200/test/test/$i -d '{"data":"version2"}'; done

Then on EC1 add a new version of the 10 documents, I ran: for i in {1..10}; do curl -X POST localhost:9200/test/test/$i -d '{"data":"version3"}'; done

You get inconsistent reads from EC1 if you query the 10 documents, sometimes they state version 1 next time they state version 2 or some state version 3. From EC2 not all documents have been updated to version 3 but the read for me was consistent.

Now remove the split brain by running: iptables -F on EC1.

When you read the 10 documents some state version 3 some state version 2. This shows that some of the data passed into the old cluster master is lost.

I'm still working on test for data loss due to issue 2,

Cheers,

Mark

Igor Motov

unread,

Nov 28, 2013, 11:13:55 PM11/28/13

to elasticsear...@googlegroups.com

Yes, in this scenario, it makes sense. You were indexing directly into old master while it wasn't part of the cluster because your client (curl) wasn't aware of the cluster state. You can avoid this situation, by using cluster-state aware client (node client) or by indexing into a non-master node. If you want to use curl a dedicated client node schema should work.

meghdoot.b...@gmail.com

unread,

Jun 19, 2014, 2:27:30 PM6/19/14

to elasticsear...@googlegroups.com

meghdoot.b...@gmail.com

unread,

Jun 19, 2014, 2:52:23 PM6/19/14

to elasticsear...@googlegroups.com

Hi Igor,

We are looking at using this as the client

http://pyelasticsearch.readthedocs.org/en/latest/

Can you provide your feedback will this client be necessarily cluster-state aware?. ES1 should be taken down from zookeeper in 60 seconds/disconnect timeout. Assuming new master is not elected (which it should not) in that 60 seconds , it feels we should be all right if this client is populated by node endpoints looking at the current list from zookeeper.

Reply all

Reply to author

Forward