Hey I just ran into this bug today with Zookeeper 3.4.6:
https://issues.apache.org/jira/browse/ZOOKEEPER-885#comment-13749751
The short version is that a short blip of IO delay on one of our Zookeeper nodes disconnected several of our clients. Some of those clients were our ElasticSearch nodes. The nodes took longer than their session timeout to get re-connected, which triggered our ElasticSearch farm to start re-allocating shards across it. This re-allocation took up a whole ton of IO and triggered several alerts for awhile.
We obviously will tune our ES farm so that the reallocation is less impactful when it happens ... but can you implement some settings into the Zookeeper plugin so that we can increase the time the ES farm waits before re-allocating data everywhere? I'd love to be able to have it wait 5 minutes for a node to come back up before deciding that the ES farm needs to re-allocate all of the shards.