A quick update what I'm working on in jgroups-raft.
jgroups-raft has always been a prototype, and its code quality has never
been on par with JGroups itself.
I've started to change this and bring its
robustness/correctness/performance up to speed. Things I've changed so far:
* Single threaded execution of RAFT; this not only eliminates issues
caused by threading, but also allows me to remove synchronization around
all shared data structures.
* As a nice side effect, this increased performance (and makes the code
less complex)!
* I added a synchronous test execution framework; very nice as it allows
me to get rid of threading/timing issues (no sleeps!) and all execution
is synchronous (immediate). This is also a great asset for debugging, as
it allows me to single-step through an entire scenario
* I've added unit tests to test all possible scenarios in append. Next
up: snapshotting of logs, feeding them to the follower(s) and leader
election
* Numbers: Pedro has shown 17'000/sec increments of a shared counter in
a 3 node cluster (local box); I've tested 3 local nodes on my older
MacMini and got ~7000 increments/sec/node.
* On my side, barring leader election (but including snapshotting), I've
been able to run a distributed counter in a 3 node scenario, and having
300 threads increment it for 10 minutes: all 3 counters showed the same
final value (this is the whole point of raft)!
Next up:
* Adding tests for snapshotting, making ELECTION add role changes as
events into RAFT, rather than invoking methods directly, to fit with the
sync-less and single-threaded model of RAFT.
* Change CounterPerf to use different state change events, ie. rather
than add-1; add-randon, subtract-random etc. This allows me to find
inconsistent logs more quickly
* Run a test for 24hrs
* Provide a jepsen test, showing that jgroups-raft provides linerarizability
--
Bela Ban |
http://www.jgroups.org