cassandra: 2.1.18 (maybe not related)
The problem:
Every time we join multiple nodes to the cluster (or change replication factors of multiple keyspaces) in a small time window, client-side applications using cassandra java-driver triggers Young/Full GC.
Analysis:
We figured adding new node or changing keyspace replication would trigger TOPOLOGY_CHANGE/SCHEMA_CHANGE event, which was listened by driver-side application, and the extra memory pressure is caused by a TokenMap.build.
Per https://datastax-oss.atlassian.net/browse/JAVA-664, mappings for different keyspaces with same replication factors are cached, in which case our cluster has 5 different dc-replication-factors (including the non-replicated system keyspace), and 3 of them are basically same. The system_auth keyspace, however, has a ~20 replication factor in 2 major datacenters (sum of RF: 50).
We packaged a modified version of cassandra-driver-core, which only calculated mapping for keyspaces specified by application, and conducted simple tests. It seemed that calculating mapping for system_auth alone takes around 100Mb memory...
Question:
1. We read the source code concerning system_auth of cassandra-driver and cassandra, is it true that if the application never uses system_auth, the cassandra-driver itself won't ever send query to system_auth? Just queried when cassandra node authenticates the user?
2. If we keep the metadataEnabled configuration to true while just computing token map for restricted keyspaces (e.g. system and keyspaces that the application uses), will we lose something like token-awareness? Furthermore, what would happen exactly if we disable metadata at driver-side?
3. Is there any delicate solution to this in new versions of (java) cassandra-driver?
Thanks for reading, and sorry for my poor expression :p