Consistency bugs in MongoDB 3.6.4, fast linearizability check and new node-jepsen topology

Denis Rystsov

unread,

Jun 12, 2018, 5:37:44 PM6/12/18

to Jepsen Talk

Hi Kyle,

TL;DR The use of different server-client topology reveals more bugs; simplified linearizability check makes it's possible to test long histories, and the recent MongoDB analysis may have a mistake.

I was exploring different ways of checking linearizability, and it looks like I found a violation of linearizability in MongoDB 3.6.4 (I haven't tested 3.4.0, but there is a possibility that it's also affected).

Jepsen scripts: https://github.com/rystsov/consistency-mongodb/tree/jepsen

Compared to your analysis - http://jepsen.io/analyses/mongodb-3-4-0-rc3:

Each node hosts an application exposing HTTP key/value interface, so Jepsen is working over HTTP instead of hitting MongoDB directly. As a result, different MongoDB clients may have a different view during a network partition and may reveal more bugs.
The configuration doesn't use gen/stagger, so workers work without delays

Also, I used a different checker; the existing linearizability checker is very demanding, and validations of long histories often end with OOM exceptions which is expectable since the problem is NP-complete. But if we add additional restrictions, it's possible to shift complexity into O(x ln x) or even to O(x) where x is the length of the history.

In my case, I had a single writer per key which was writing values forming an increasing sequence of numbers. Obviously, in a linearizable system, every read must be greater or equal to the previous read. So instead of linearizability, I was checking for monotonicity: when the latter is violated the former is broken too.

And, hopefully, I was able to observe the violation (reading 65 after 73 had been already read):

{:type :invoke, :f :read, :value nil, :host "node3", :process 295, :time 33119025100}
...
{:type :ok, :f :read, :value 73, :host "node3", :process 295, :time 33372398800}
...
{:type :invoke, :f :read, :value nil, :host "node3", :process 307, :time 34403679100}
...
{:type :ok, :f :read, :value 65, :host "node3", :process 307, :time 35101938700}

I hope you'll find it interesting :)

Cheers,

Denis

Kyle Kingsbury

unread,

Jun 12, 2018, 6:02:23 PM6/12/18

to ta...@jepsen.io

On 06/12/2018 04:37 PM, Denis Rystsov wrote:
> In my case, I had a single writer per key which was writing values forming an
> increasing sequence of numbers. Obviously, in a linearizable system, every read
> must be greater or equal to the previous read.

This is true if writers are restricted to a single process, but it's not clear
to me that your test enforces this constraint. It's kind of hard to tell without
comments, but it looks like this checker doesn't actually verify that all writes
belong to the same process.

This raises the possibility that:

1. write w1 times out
2. write w2 completes successfully
3. a process observes w2
4. w1 completes
5. that process observes w1

This history is linearizable, might fail your checker, and would be generated by
a naive Jepsen test--but the stateful constructs you've introduced (e.g. the
dispatcher) and... whatever you're doing with independent/concurrent-generator
(testing nodes instead of keys?) might prevent it. Hard to tell without code
comments.

I'm happy to see that you're exploring this behavior, though! Best of luck. :)

--Kyle

Denis Rystsov

unread,

Jun 12, 2018, 9:30:36 PM6/12/18

to Jepsen Talk

Thanks! It's indeed the issue you described, and after I added a precondition the anomaly went away.

I'll continue looking into linearizability checkers, according to "Testing Shared Memories" the problem of testing linearizability with fixed number of processors (our case) isn't NP-complete and has O(x ln(x))-time complexity.

Kyle Kingsbury

unread,

Jun 14, 2018, 1:10:40 AM6/14/18

to ta...@jepsen.io

Sounds good! Keep in mind the number of processes in a typical Jepsen test is indeed finite, but may be arbitrarily high--hundreds of thousands of processes is a thing that can happen in an aggressive test. :-)

--
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To post to this group, send email to ta...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/77568be9-59f9-4c3e-9647-8135a397d16e%40jepsen.io.

Reply all

Reply to author

Forward