Intermittent slow queries when connection pool local size is >1

17 views

Skip to first unread message

Sam

unread,

Aug 23, 2023, 8:54:06 AM8/23/23

to DataStax Java Driver for Apache Cassandra User Mailing List

Hi folks,

I have been troubleshooting slow queries (both read and write) observed from the client side only (server-side tracing showed no such spikes). Slow query logging was enabled for any query taking longer than 80ms and we found that ~85% of the recorded slow queries were in the 200-220ms bucket, which I have spent days trying to explain.

Some info

The GC pause time on our Cassandra nodes is 50ms, however, during the spikes no GC activity was occurring.
36 node Cassandra cluster, queries all executed at LOCAL_QUORUM
Connections per local host = 2 (key point, see below)
TCP_NODELAY is enabled

After a significant period of debugging, we found that reducing connections per local host back from 2 to 1, these spikes completely disappeared. This indicated to me that connections were potentially going idle and perhaps the TCP congestion window was reducing, or something similar. With this in mind I looked into the ChannelPool logic on how connections are selected and they use the channel with the most free stream ids, which should ensure both connections in the pool are used, when the node is selected for use.

I'm keen to hear if anyone has experienced anything similar, or has any thoughts on what else could be interacting that lead to queries latencies sitting around the 200ms mark.