I created two-nodes cluster with single keyspace (replication_factor=2), 256 tokens, both nodes are seeds, in the same DC.
Started two applications with datastax java driver com.datastax.cassandra:cassandra-driver-dse:2.1.5 :
- data-loader which populates the ~200K rows into cassandra (only inserts to C*)
- LoadBalancingPolicy = RoundRobinPolicy
- ConsistencyLevel = ALL
- connected to both nodes
- data-service which selects data and exposes through REST interface for clients (only selects from C*)
- LoadBalancingPolicy = RoundRobinPolicy
- connected to both nodes
These two applications started simultaneously, data-loader is starting loading from legacy database, populating data (~200K) into C* and finished after ~12 minutes. Both applications are connected to both nodes (found in logs), data duplicated on two nodes.
After this step data cannot be retrieved through REST API (select returns empty dataset from C*), but could be found on both C* nodes using cqlsh console. After some delay (1-4 hrs) all data become available in data-service.
After some investigation I found that restarting data-service applications (reconnecting to cassandra cluster) helps to get results immediately after populating.
I think it's related to caching empty tokens in datastax java driver which seems to be updated really slow.
Please check the issue.
Regards, Dmitry.