What's the right way to connect to cluster when testing?

Bin Wang

unread,

Sep 10, 2018, 5:38:07 AM9/10/18

to Jepsen Talk

Hi,

I'm trying to use Jepsen. There is something I'm confused: how do I connect to the cluster? In the tutorial, we have 5 nodes so there are 5 clients connect to each of them. But when we introduce failures like node shutdown, of course some of the queries will fail. So is it normal to have these failures or is it better to use another way to connect to cluster? For example, give all the 5 nodes' addresses to the clients and let client try to choose an available one. But if we let the client to choose the server address, is it enough to test the correctness since some of nodes may out of the test?

Thanks,

Bin Wang

Kit Patella

unread,

Sep 10, 2018, 3:11:20 PM9/10/18

to ta...@jepsen.io

Hey Bin,

In most cases, we’re interested in testing the connections between DB processes, rather than between clients and the database. Its ok, and even healthy for many consistency models, for a cluster to reject operations from a cluster member in full or partial failure. Trying operations against unstable nodes, without Jepsen routing around them, is usually the meat of what we want to observe in a Jepsen test.

A node is a machine (or more likely, a VM) that contains one or more DB processes. When we partition the network, we cut and heal connections between nodes. i
Importantly, we do not sever the node from the machine orchestrating the test. (Jepsen’s built in nemeses take care of this for you) I believe we do not have any tests that shut down an entire node (Kyle can correct me here if I’m mistaken), instead we’ll interrupt or terminate the DB process on the node.

In essence, Jepsen’s partitioner nemeses do not prevent Jepsen clients from communicating with their assigned nodes. Feel free to reply with any connection errors you’d like assistance debugging.

Kit

--
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To post to this group, send email to ta...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/085ca4b4-d274-4357-9d45-fdd2820388b9%40jepsen.io.

Kyle Kingsbury

unread,

Sep 11, 2018, 1:44:52 AM9/11/18

to ta...@jepsen.io

To add on to this--yes, clients can and should fail to execute operations when the node they're talking to is, say, crashed. That's a normal part of testing with Jepsen. You probably want this behavior instead of letting clients reconnect to other nodes because doing it this way maximizes your chance of observing consistency violations between different nodes.

--Kyle

To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/CADET6SDAS13ve%3Dd8TteRuQX0AKS2Oh77-4Dg0FL06a_QsKOXRQ%40mail.gmail.com.

Bin Wang

unread,

Sep 11, 2018, 3:15:16 AM9/11/18

to ta...@jepsen.io

Thanks for the great answers!

I have another related question: the cluster has a master node that doesn't accept any query. So for the client that connected to the master node, what do I need to return from `invoke!` method? `:ok` or `:fail`? Or is there an option to let the client don't connect to master node?

Kyle Kingsbury <ap...@jepsen.io>于2018年9月11日周二下午1:44写道：

To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/CAMotZ_xtrR4c6Se6i7erAgm5ofqmBbg%3D%3DE2uuw13TJ9XK2XAKw%40mail.gmail.com.

Kyle Kingsbury

unread,

Sep 11, 2018, 3:21:03 AM9/11/18

to ta...@jepsen.io

Yeah, if it didn't take place, it should be a :fail operation. You could also tell Jepsen to not schedule ops on that node, but I don't recommend it; I think you could fail to observe errors when leadership changes.

--Kyle

To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/CAD_32VW8rkqJwmaEb1ddByOwVv5-Laqb7asPaQk8MFQzfK6XFw%40mail.gmail.com.

Reply all

Reply to author

Forward