Some questions about load balancing policies in Java driver

57 views
Skip to first unread message

Jun Wu

unread,
Jan 25, 2016, 4:17:18 PM1/25/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi there,

    I'm doing some research in load balancing policies in Java driver, but feel confusion about some questions:

    1. For RoundRobin it's easy to understand, which does not consider the data center and in a round-robin fashion. However, for DCAwareRoundRobin, the API doc shows that:

This policy provides round-robin queries over the node of the local data center. It also includes in the query plans returned a configurable number of hosts in the remote data centers, but those are always tried after the local nodes. In other words, this policy guarantees that no host in a remote data center will be queried unless no host in the local data center can be reached.


    I'm very curious about its application. It said that it won't consider the remote data center unless all local nodes are down. If that's the case, what's the purpose of the remote data center, for safety only? For example, if i have 2 data centers with 2 nodes in each (such as "us-west" and "us-east"). I specified the replication factor to be "us-west":1 and "us-east":1, which means 1 copy of data to be sent to "us-west" and anther copy to another data center. All these nodes work fine. If I use DCAwareRoundRobin("us-west") to specify the local data center, then according to the previous explanation, it won't write data to "us-east" unless the two nodes in "us-west" are down. I cannot see any practical purpose for this, as I assume that the possibilities for all nodes to be down are very low. Also, I have specify the replication factor to be 1 in the remote data center, if I use DCAwareRoundRobin("us-west"), then it means no copy will be sent to the remote data center. Am I right on this?

    2. For situations in multiple data center, if still the similar situation, 2 data centers with 3 nodes in each. Node 1, 2, 3in one data center 1and 4, 5, 6 in data center 2. The replication factor is 2 and 2 for each data center. If I choose node 1 to be the coordinator, then when a write request comes in, it knows which nodes to write to in data center 1, based on the load balancing policy the cluster used. However, for the other data center, 2 other copies of data needed to be written. Then my question is will the data is sent directly from the coordinator, or it will send the data to one of the nodes in data center 2 (which probably is called remote coordinator) and it'll decide which nodes to be sent. Because I'm doing experiments in Amazon EC2, sending 2 copies to another data center directly and sending 1 copy to one of the nodes in data center 2 and this node send copies to 2 other copies, matter a lot.

    3. For TokenAwarePolicy(DCAwareRoundRobin), which is the default policy and according to the self-learning course, it performs best among all.  However, it seems to be a little bit hard to understand and I'm wondering is there a detailed explanation on this (why it performs best)? The following is from the API:
  • the iterator return by the newQueryPlan method will first return the LOCAL replicas for the query (based on Statement.getRoutingKey()if possible (i.e. if the querygetRoutingKey method doesn't return null and if Metadata.getReplicas(java.lang.String, java.nio.ByteBuffer) returns a non empty set of replicas for that partition key). If no local replica can be either found or successfully contacted, the rest of the query plan will fallback to one of the child policy.
      Does that means the coordinator will calculate the token of the primary key, and based on that key it'll send data directly to that node owing the token? 

      Also, what'll this be if in multiple data centers? In 2 data centers, does the 2 rings have its own token rings?  If using TokenAwarePolicy(DCAwareRoundRobin), does that means the data cannot be written to another remote data center, even if 1 and 1 replication factor having been specified?

     Sorry for the overwhelming paragraphs. Hope to get some hints/answers.

    Thanks!

Jun 


Jack Krupansky

unread,
Jan 25, 2016, 4:30:10 PM1/25/16
to java-dri...@lists.datastax.com
" I assume that the possibilities for all nodes to be down are very low"

True, and we certainly hope so when we pick a data center vendor, but the whole point is to handle data center outages, where the entire data center is unavailable, such as with a network connectivity issue or power outage or earthquake.

In truth, DCAware is more a DC Preference policy, such as for the DC that is more local or cheaper or faster or whatever factor that makes it preferred. If you truly don't have a preference for hitting only the local/preferred data center, don't use this policy and stick with the round robin policy.


-- Jack Krupansky

--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Vishy Kasaravalli

unread,
Jan 25, 2016, 4:33:50 PM1/25/16
to java-dri...@lists.datastax.com
On Jan 25, 2016, at 1:17 PM, Jun Wu <wuxia...@hotmail.com> wrote:

Hi there,

    I'm doing some research in load balancing policies in Java driver, but feel confusion about some questions:

    1. For RoundRobin it's easy to understand, which does not consider the data center and in a round-robin fashion. However, for DCAwareRoundRobin, the API doc shows that:

This policy provides round-robin queries over the node of the local data center. It also includes in the query plans returned a configurable number of hosts in the remote data centers, but those are always tried after the local nodes. In other words, this policy guarantees that no host in a remote data center will be queried unless no host in the local data center can be reached.


    I'm very curious about its application. It said that it won't consider the remote data center unless all local nodes are down. If that's the case, what's the purpose of the remote data center, for safety only? For example, if i have 2 data centers with 2 nodes in each (such as "us-west" and "us-east"). I specified the replication factor to be "us-west":1 and "us-east":1, which means 1 copy of data to be sent to "us-west" and anther copy to another data center. All these nodes work fine. If I use DCAwareRoundRobin("us-west") to specify the local data center, then according to the previous explanation, it won't write data to "us-east" unless the two nodes in "us-west" are down. I cannot see any practical purpose for this, as I assume that the possibilities for all nodes to be down are very low. Also, I have specify the replication factor to be 1 in the remote data center, if I use DCAwareRoundRobin("us-west"), then it means no copy will be sent to the remote data center. Am I right on this?


Load Balancing Policy only chooses co-ordinators. It is cassandra’s job to replicate to other DC based on the RF set for each DC. 

    2. For situations in multiple data center, if still the similar situation, 2 data centers with 3 nodes in each. Node 1, 2, 3in one data center 1and 4, 5, 6 in data center 2. The replication factor is 2 and 2 for each data center. If I choose node 1 to be the coordinator, then when a write request comes in, it knows which nodes to write to in data center 1, based on the load balancing policy the cluster used.

LBP is a purely client side construct. Node1 on the server side writes to other nodes based on the RF specified. 

However, for the other data center, 2 other copies of data needed to be written. Then my question is will the data is sent directly from the coordinator, or it will send the data to one of the nodes in data center 2 (which probably is called remote coordinator) and it'll decide which nodes to be sent. Because I'm doing experiments in Amazon EC2, sending 2 copies to another data center directly and sending 1 copy to one of the nodes in data center 2 and this node send copies to 2 other copies, matter a lot.

This has nothing to do with LBP but data is sent to one remote co-ordinator in DC2. 


    3. For TokenAwarePolicy(DCAwareRoundRobin), which is the default policy and according to the self-learning course, it performs best among all.  However, it seems to be a little bit hard to understand and I'm wondering is there a detailed explanation on this (why it performs best)? The following is from the API:
  • the iterator return by the newQueryPlan method will first return the LOCAL replicas for the query (based on Statement.getRoutingKey()if possible (i.e. if the querygetRoutingKey method doesn't return null and if Metadata.getReplicas(java.lang.String, java.nio.ByteBuffer) returns a non empty set of replicas for that partition key). If no local replica can be either found or successfully contacted, the rest of the query plan will fallback to one of the child policy.
      Does that means the coordinator will calculate the token of the primary key, and based on that key it'll send data directly to that node owing the token? 

TokenAwarePolicy expects user to set the routing key.  The routing key is computed automatically only in specific cases described below. For all other cases, you must compute and set the routing key yourself.

• BoundStatement only when all the columns composing the partition key are bound variables of the BoundStatement.
• QueryBuilder only when when a TableMetadata is provided to the builder and and all partition key columns were set via the QueryBuilder. Example: QueryBuilder.select().from(cluster.getMetadata().getKeyspace("tk").getTable("tt")).where(QueryBuilder.eq("pk1", "k1"))


      Also, what'll this be if in multiple data centers? In 2 data centers, does the 2 rings have its own token rings?  If using TokenAwarePolicy(DCAwareRoundRobin), does that means the data cannot be written to another remote data center, even if 1 and 1 replication factor having been specified?


As stated in the javadoc, TokenAwarePolicy prefers LOCAL replicas.  It is cassandra’s job to replicate to other DC based on the RF set for each DC. 

     Sorry for the overwhelming paragraphs. Hope to get some hints/answers.

    Thanks!

Jun 



Olivier Michallat

unread,
Jan 26, 2016, 11:36:06 AM1/26/16
to java-dri...@lists.datastax.com
I've written some doc on load balancing policies. It hasn't been published yet, but you can read it on github:


Once we release 3.0 it will be in our manual.

--

Olivier Michallat

Driver & tools engineer, DataStax

Jun Wu

unread,
Jan 27, 2016, 6:26:56 PM1/27/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Thank you so much for your reply.

Yes, that's reasonable backup in case of some emergent situations.

Jun Wu

unread,
Jan 27, 2016, 7:10:40 PM1/27/16
to DataStax Java Driver for Apache Cassandra User Mailing List

Thank you so much Vishy Kasar!

However, it still not that clear for me, especially the coordinator on client and server side. Could I take an example and dig deeper on this?

Let's say that we have 2 data centers: DC1 and DC2, the figure also be got from link here: https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png
There're 10 nodes in each data center. We set the replication factor to be 3 and 3 in each data center.

Assume we have keyspace "student" and table "studentinfo", with (ind id primary key, text name) specified. 

Here we only consider write path, so we start to write query from client, and you can see that node 10 in DC1 has been chosen as coordinator. 

For example, we have 5 queries/rows to be written in this cluster.
Query 1: insert into student.studentinfo (id, name) values (1, "Allen");  
Query 2: insert into student.studentinfo (id, name) values (2, "Alex"); 
Query 3: insert into student.studentinfo (id, name) values (3, "Brandon"); 
Query 4: insert into student.studentinfo (id, name) values (4, "Jess"); 
Query 5: insert into student.studentinfo (id, name) values (5, "Ryan"); 

If we use roundrobin policy:
For query 1, the query plan will return all the hosts: 1, 2,... 10 in DC1 and 1,2, ...10 in DC2. If we assume node 1 is the first node found to be written data to, then (1, "Allen") will be written to node 1. (Here, my question is how to decide the first node to be written data to?)
For query 2, as node 1 has been chosen as the first node, then (2, "Alex") will be written to node 2 in round robin fashion.
For query 3, similarly,  (3, "Brandon") will be written to node 3.
For query 4, (4, "Jess") to node 4;
For query 5, (5, "Ryan") to node 5;

Then according to our assumption, 3 replicas should be written, then how to write data of next 2 replicas?

For DC2, if it's similar to the previous one, with the same problem.


Another question is about the remote coordinator, the previous figure shows that node 10 in DC1 will write data to node 10  in DC 2, then node 10 in DC2 will write 3 copies to 3 nodes in DC2.

But, another figure from datastax shows different method, the figure can be found here, https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html
It shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2, without using remote coordinator.

I'm wondering which case is true.

Thanks again!

Jun Wu

unread,
Jan 27, 2016, 7:13:15 PM1/27/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Olivier,

   That's really useful information and I hope I could read it earlier.

   Anyway, I'm still learning, and you can check my response to Vishy Kasar.

   Thanks!

Vishy Kasaravalli

unread,
Jan 27, 2016, 7:24:30 PM1/27/16
to java-dri...@lists.datastax.com
On Jan 27, 2016, at 4:10 PM, Jun Wu <wuxia...@hotmail.com> wrote:
Thank you so much Vishy Kasar!

However, it still not that clear for me, especially the coordinator on client and server side. Could I take an example and dig deeper on this?

Let's say that we have 2 data centers: DC1 and DC2, the figure also be got from link here: https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png
There're 10 nodes in each data center. We set the replication factor to be 3 and 3 in each data center.

Assume we have keyspace "student" and table "studentinfo", with (ind id primary key, text name) specified. 

Here we only consider write path, so we start to write query from client, and you can see that node 10 in DC1 has been chosen as coordinator. 

For example, we have 5 queries/rows to be written in this cluster.
Query 1: insert into student.studentinfo (id, name) values (1, "Allen");  
Query 2: insert into student.studentinfo (id, name) values (2, "Alex"); 
Query 3: insert into student.studentinfo (id, name) values (3, "Brandon"); 
Query 4: insert into student.studentinfo (id, name) values (4, "Jess"); 
Query 5: insert into student.studentinfo (id, name) values (5, "Ryan"); 

If we use roundrobin policy:
For query 1, the query plan will return all the hosts: 1, 2,... 10 in DC1 and 1,2, ...10 in DC2. If we assume node 1 is the first node found to be written data to, then (1, "Allen") will be written to node 1. (Here, my question is how to decide the first node to be written data to?)

The co-ordinator is just any node and does not have to be a replica in roundrobin policy. In this case your client chose node1-dc1 as a co-ordinator. node1-dc1 on the server side will figure out replicas and takes care of writing to them.

For query 2, as node 1 has been chosen as the first node, then (2, "Alex") will be written to node 2 in round robin fashion.
For query 3, similarly,  (3, "Brandon") will be written to node 3.
For query 4, (4, "Jess") to node 4;
For query 5, (5, "Ryan") to node 5;

Then according to our assumption, 3 replicas should be written, then how to write data of next 2 replicas?

As a client, you do not worry about that. It is cassandra server’s job to write to replicas. 

Jun Wu

unread,
Jan 27, 2016, 11:30:13 PM1/27/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Vishy Kasar,

    Thanks again for your reply. I really appreciate it.

    From your explanation that the load balancing policy is only related to the choose of coordinator. I can get it now.

     However, for the replicas (replication factor), do you know how the coordinator to decide which nodes to write to? Is there any mechanism for the server side? If you know any source where I can get this information, could you let me know?

    Thanks!

Jun 

Vishy Kasaravalli

unread,
Jan 28, 2016, 12:14:34 PM1/28/16
to java-dri...@lists.datastax.com
Each cassandra node is primary responsibility for a token range. It may also have secondary responsibilities for other token ranges based on the replication factor. 

When a request arrives at co-ordinator, the primary key of the request is mapped to a token. Based on that token, co-ordinator knows who the replicas are.

Jack Krupansky

unread,
Jan 28, 2016, 3:49:10 PM1/28/16
to java-dri...@lists.datastax.com
If you are adventurous, you can read the code and comments in the calculateNaturalEndpoints method in:

That, of course, is if you are using the Network Topology Strategy, which I presume you are if using multiple DCs.

In any case, start with the doc and then ask very specific questions about what is said there:



-- Jack Krupansky

Jack Krupansky

unread,
Jan 28, 2016, 3:51:45 PM1/28/16
to java-dri...@lists.datastax.com
I forgot to add... it's starting to sound as if you are asking about Cassandra itself rather that the Java driver. If so, please pursue any questions about server behavior on the normal Cassandra user email list. This list is concerned with how to use the Java driver, not Cassandra itself.

-- Jack Krupansky

Jun Wu

unread,
Jan 29, 2016, 12:14:39 PM1/29/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Thank you so much Vishy Kasar for your kind help. 

I'll try to see what cassandra will do for the replicas selection.

Thanks again!
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsub...@lists.datastax.com.

Jun Wu

unread,
Jan 29, 2016, 12:15:36 PM1/29/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Thank you so much Jack, I think the code and docs should be what I want for the replica selection part.

Again, thanks for your kind help!

-- Jack Krupansky

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsub...@lists.datastax.com.

Reply all
Reply to author
Forward
0 new messages