Hi, everyone.
I have just played with LogCabin, then based on the perf stats, I come up with some questions and thoughts.
Has anyone once tested the performance of the recent LogCabin?
I use the Benchmark program in its repo and made a simple test in Google Cloud.
With 3 replicas and 1 client (1 thread).
Each replica uses n1-standard-16 VM and the client uses n1-standard-4 VM.
I make the client complete 10000 requests and find the average end-to-end latency can be more than 2ms/req, which is much larger than the number reported in Diego's thesis (~1ms).
I feel the disk persistence must contribute a lot to the overheads. So I made a microbench test of disk write
for(1...1000):
sta = Time()
Write string length of 100
fsync
end = Time()
latency = end - sta
Then the median of the latency is ~2ms, so that seems to verify my speculation. [Actually I have employed SSD on my VM, not sure why it still costs so much]
Here my first question is: has anyone ever tested LogCabin and obtained a similar result? Is it really so bad?
More importantly: does LogCabin have any optimization of disk write? At least in the paper, it says everytime before replying to the Append RPC, it needs to write to disk. Does that mean there must be at least one call of fsync in the critical path during each request commit?
From his thesis, Diego seems to have implemented pipelining as an optimization. I am not sure whether batching is implemented in the current version of logcabin.
Besides that, are there any other effective optimization (e.g. in other Raft implementation) that can be considered towards the log persistence?
Another open question:
If I want to compare my work with a Raft baseline, which implementation is more persuasive to choose as the baseline? (Since my work is implemented in C++, I think I should prefer to choosing from the C++ group) Any recommendations?