That's good to know about the quorum. Does the quorum size still have to be set by the user though? I suppose it does since the cluster membership is dynamic.
If that's the case, perhaps some docs on safe locking would be useful to users.
In a three node cluster - nodes A, B, and C - if a split occurs between nodes A and C, but not A and B and not C and B, both A and C will still be able to see majority of the cluster. In that event, it seems that if an HA verticle were deployed on node A, and such an "intersecting partition" occurred between nodes A and C only, because both nodes can see a majority of the cluster, node C would redeploy the HA verticle from node A, and node A would not undeploy the HA verticle. Then again, this all depends on how Hazelcast detects failures.
But I digress. I just can't help myself :-)
Obviously this level of coordination is a problem for consensus which is a bit of overkill for most basic Vert.x use cases. But that's why I'm interested in making the ZooKeeper cluster manager available.
Some day I will bring my Raft implementation to Vert.x. ZooKeeper will be a great option, but it's reliant upon an external cluster. The best aspect of Vert.x's use of Hazelcast is obviously its dynamism. But the issue with Raft being embedded in a system like Vert.x is that clusters for consensus are necessarily strict. I did some work to make Raft's strict membership a bit more palatable by implementing a gossip-based membership protocol for dynamic cluster members in Copycat.
Speaking of which, with regard to the partition scenario I described above, it could be possible for Hazelcast to prevent such a scenario. But the Hazelcast documentation isn't entirely clear to me; it seems that the oldest member is responsible for maintaining the membership list. So, if the oldest member is node A then that's still an issue.
Something I did in Copycat to prevent this type of scenario is probing. The first time a node fails to contact another member, the member's marked suspicious in that node's membership list. When the membership list containing the suspicious member is gossiped to another node, that node too will attempt to probe the suspicious member. While this process continues, the gossiped membership list carries with it a vector clock and the suspicious member in that list a set of nodes (a CRDT) that have probed the suspicious member and failed. Once the set has been merged into some satisfactory percentage of the cluster, the suspicious member is marked dead.
Of course, unlike Hazelcast which needs to replicate data lost on failed nodes, Copycat has the luxury of taking its time detecting failures outside the consensus algorithm since they're inconsequential to user state.
K now I went way off topic... Better shut up :-)