Connections between java driver and coordinator

40 views
Skip to first unread message

Jun Wu

unread,
Feb 9, 2016, 10:02:15 PM2/9/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi there,

   I have a question about the connections between the java driver and coordinator. 

   Assume we have a 5 node cluster in the same data center: 1, 2, 3, 4, 5. I choose node 1 to be the node where the java driver is. Then it'll pick the coordinator according to different load balancing policies. Assume that node 2 is chosen as the coordinator. If the replication factor is set to be 1 and node 3 is the replica node. Then on node 1, I'm writing data into the cluster.

   Then the question comes: what's the process when sending data? Will the java driver (node 1) send data to the coordinator (node 2) firstly, then node 2 send data to the replica (node 3)? Or the java driver (node 1) communicate with the coordinator (node 2), knowing that  node 3 is the replica, and it'll send data to the replica?

    To make it simple: 
    1. java driver (node 1) -----send data to coordinator----->coordinator (node 2)------send data to replica----> replica(node 3)
    2. java driver (node 1) <-------communicate with coordinator------->coordinator (node 2)
                   |
                   |
                   |_____send data to replica_____________________> replica (node 3)


    Which one is right?

    Thanks for your help.

Jun

Olivier Michallat

unread,
Feb 10, 2016, 6:15:51 AM2/10/16
to java-dri...@lists.datastax.com
The first one: the Java driver communicates with the node it has elected as coordinator, and the coordinator with the replica.

That's why the token-aware load balancing policy exists: its goal is to choose a replica as coordinator, to eliminate the extra hop.

--

Olivier Michallat

Driver & tools engineer, DataStax


--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Jun Wu

unread,
Feb 10, 2016, 9:23:48 AM2/10/16
to java-dri...@lists.datastax.com
Thanks for your reply Olivier.

But to make it clear, I'm talking about general situation, including other load balancing policies.

I'm still confused that whether the Java driver will send data to coordinator, then to replica,     Or Java driver just communicate with coordinator, not sending data to it, but sending to replicas directly.

Because in other load balancing policies, the replica, may not be the coordinator, which means Java driver, coordinator, replica may be different nodes.

Could you help me clarify the conclusion?

Thanks!

Jack Krupansky

unread,
Feb 10, 2016, 10:10:55 AM2/10/16
to java-dri...@lists.datastax.com
The driver will only send a given request to one node. That node will in turn send the request to the replica nodes.

The load balancing policy simply chooses a different node to send the next request, but still sends that request to only one node, with replication occurring internal to the cluster.

The driver is not involved in performing replication, but is aware of the replication simply for the purpose of load balancing of requests.

-- Jack Krupansky

Jun Wu

unread,
Feb 10, 2016, 6:08:59 PM2/10/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Thank you Jack!

So does that mean, the java driver will only send mutation package with the coordinator, then the coordinator will response it with where the replica node is. Then the node where java driver is will send data to the replica? Is the second one right?

1. java driver (node 1) -----send data to coordinator----->coordinator (node 2)------send data to replica----> replica(node 3)
2. java driver (node 1) <-------communicate with coordinator------->coordinator (node 2)
                   |
                   |
                   |_____send data to replica_____________________> replica (node 3)

Jack Krupansky

unread,
Feb 10, 2016, 11:35:47 PM2/10/16
to java-dri...@lists.datastax.com
The Java driver is not on a node of the cluster - it is on the client system.

I'm not sure what you are trying to say with a distinction between "send data" and "communicate".

-- Jack Krupansky

Jun Wu

unread,
Feb 11, 2016, 12:38:34 PM2/11/16
to DataStax Java Driver for Apache Cassandra User Mailing List
I'm wondering whether I understand it right. 

A node is a machine, where the driver should be deployed in this node, where this node should be in one of the nodes in the cluster. 

When I say "send data", I mean through the driver, I can write code to send write request to the cluster. In this write request, it should send the write data to the cluster.

When I say "communicate", because for each node, there should be some tcp package transmission to let each node know where's other nodes and other nodes' IPs.

That's the reason why I have the previous two thoughts on the data transmission among java driver, coordinator and replicas. I'm wondering am I right on this.

Thanks!

Jack Krupansky

unread,
Feb 11, 2016, 2:49:37 PM2/11/16
to java-dri...@lists.datastax.com
No, the driver would not normally be deployed on a node in the cluster. The driver is linked to the client app on the machine which is running the application, which is usually not a machine within the Cassandra cluster itself.

Such as in this diagram in which all nodes of the cluster are within the circle and the application clients are boxes outside the cluster:


Again, the driver is a part of the application client, a linked library.


-- Jack Krupansky

Jun Wu

unread,
Feb 12, 2016, 1:35:44 AM2/12/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Jack,

   What you explained makes much sense and I really appreciate it!

   Thanks for your kind help!

Jun
Reply all
Reply to author
Forward
0 new messages