java driver token-awareness and gossip over a low-bandwith connection

Mohammed Guller

unread,

Jun 14, 2016, 7:24:00 PM6/14/16

to java-dri...@lists.datastax.com

We have a scenario where an application writes data to a Cassandra cluster using the Datastax Java driver over a low-bandwidth connection. In other words, the C* cluster and the Java driver are not on the same network, but two different networks connected by a low-bandwidth connection.

A few questions related to this scenario:

1) How does the Java driver figure out which node is responsible for which token range?

2) How does the Java driver keep the token-awareness related information up-to-date?

3) How does the Java driver participate in the Gossip protocol? My understanding is that the Java driver figures out which nodes are alive or dead through the Gossip protocol.

4) How much bandwidth is consumed by the Gossip protocol?

Thanks,

Mohammed

Andrew Tolbert

unread,

Jun 14, 2016, 9:55:40 PM6/14/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Hi Mohammed,

1) How does the Java driver figure out which node is responsible for which token range?

The driver generates a Token Map for each keyspace from the token ranges returned from the system.peers table for each host. Then, given a BoundStatement, it is able to derive the partition key columns of the table being queried from the PreparedStatement and if the partition key values are all present in the of the query we're able to derive the Token from those values. Then, given the token and the keyspace of the table, we lookup the replicas matching the range in that token map using Metadata#getReplicas(keyspace, token).

2) How does the Java driver keep the token-awareness related information up-to-date?

The java driver chooses one contact point and maintains a 'control connection' that subscribes to schema, host state (up, down) and topology change events (a nice feature of the native protocol). Whenever the cluster topology changes (node being added/removed/moved) or a keyspace changes, we'll fetch the latest data from the peers and schema tables.

3) How does the Java driver participate in the Gossip protocol? My understanding is that the Java driver figures out which nodes are alive or dead through the Gossip protocol.

It uses the event subscription feature I described in #2, which are propagated over the control connection. While a C* node would be made aware of changes over gossip, it propagates the changes as events over the cql native protocol.

4) How much bandwidth is consumed by the Gossip protocol?

From the ArchitectureGossip wiki page it appears that gossip is triggered by a timer every second, and the payload doesn't look very large. However, as mentioned previously, C* replicas don't gossip directly with the java driver, rather they communicate events to the driver that it should care about. This should be pretty infrequent as schema changes and topology changes don't happen very often and the payload is not very large.

Thanks!
Andy

Mohammed Guller

unread,

Jun 14, 2016, 10:16:24 PM6/14/16

to java-dri...@lists.datastax.com

Hi Andy,

Thanks for the detailed response. It is very helpful.

Mohammed

--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Reply all

Reply to author

Forward