Degraded throughput as client connections increase in 3.x

62 views
Skip to first unread message

Muthukumaran Kothandaraman

unread,
Oct 27, 2016, 2:10:32 AM10/27/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi, 

Following are the details of my setup and related clarifications at the bottom

Cluster and Topology :

C* Cluster Size : 4 Nodes (single keyspace, single table, RF = 3, WCL = QUORUM)
C* Node details : Each of 4 Nodes is a Ubuntu *VM* - with 10vCPU and 24GB RAM + 20GB disk
Cluster Topology : 2 C* Uuntu VMs run on 1 baremetal host server each (Config of baremetal server : 32Core + 120GB RAM + 1 TB *HDD* (yes, we are trying to get SSD but that would take some time))
Distributed 2 VMs on each of 2 baremetals mainly hoping that separate disk controllers would be better than cramming all VMs in single host
Clients are run from 3 nodes (in adition to C* nodes) in same network

Cassandra Version Details :
server version is 3.7.0
datastax driver version 3.1.0

JDK Environment on server and client:
Oracle JDK 1.8

Server side config changes :
cassandra.yaml is left with default values. Before, I could start optimizing server, I thought I will have all client aspects addressed

Datastax Driver Connection details are as follows:
datastax.contact.points = all 4 IPs of nodes have been added. I understand this is not the right thing to do as wel scale
datastax.max.reqs = 4096
datastax.min.conn = 8
datastax.max.conn = 16
read request time out is 5 mins
TokenAwarePolicy used : Yes
Retry Policy for errors : No

Client :
Using preparedstatement : Yes
async execution approach : Yes
Using batching : No. This is by choice as usecase does not permit
Inserts are executed in parallel - ie. we fire as many async executions as could be taken by the driver without any sequencing (waiting for response from previous insert)
Using only futurecallback on listenablefuture and not doing any blocking gets on the future returned by async execution

Test Scenarios :
Scenario 1 : Pumping 2M inserts from single client node (separate node connected to cluster)
Scenario 2 : Pumping 2M inserts from each of all 3 client nodes (separate client processes from separate VMs again)

How throughput is computed :
Measuring the rate from within onSuccess callback of ListenableFuture returned by async execution call

Observations :
1) For scenario 1 above, I was able to see upto 11K writes/second
2) For scenario 2 above, I am surprised to observe only upto 7 to 8K writes/second
3) In both scenarios, we noticed that the client connection count towards is not stable. when we monitor netstat of 9042 port, we see that the connection count fluctuating and after a heavy load of inserts, some connections
are lost across the nodes. Sometimes even less than minimum connections (8 in this case)
4) We also notice following exceptions on the client side for about 4-5% of inserts which indicate inability of server to get QUORUM ack for some inserts -
Failed to insert:
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
        at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:100)
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:122)[327:com.datastax.driver.core:3.1.0]
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:500)[327:com.datastax.driver.core:3.1.0]
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)[327:com.datastax.driver.core:3.1.0]
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)[327:com.datastax.driver.core:3.1.0]
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)[143:io.netty.transport:4.0.37.Final]
.......
.......


Clarifications :
1) Setting aside the errors observed, what could be probable cause of throughput not scaling (in fact, degrading) when we add more clients - ie. pumping traffic from multiple nodes as in scenario 2 ?
2) Would retry policy configuration on client-side help in addressing above error or it is indicative of some serious resource issues on the server node ?
3) Is the way in which we measure throughput from client perspective is acceptable ?
4) Nodetool does not provide any specific throughput metric or we are missing something in properly reading the nodetool output - are there any other means of determining the server-side execution throughput ?

Are we missing something very basic on the throughput scaling part - because degraded writes/second is more worrisome even if other issues are addressable ?

Thanks in advance

Regards
Muthu






Kevin Gallardo

unread,
Nov 10, 2016, 9:01:05 AM11/10/16
to java-dri...@lists.datastax.com
Hi, 

Thanks for the details of your observations, I'll try to answer your questions below:

3) Is the way in which we measure throughput from client perspective is acceptable ? 
The most complete way to monitor queries and throughput, is using the both the QueryLogger and follow the instructions for monitoring the Connection pools of the driver. Those can give you a better insight on the behaviour of the driver during your tests. Therefore, it would be interesting to get the informations you are getting when monitoring the connection pools, if you could post those here it would help.

4) Nodetool does not provide any specific throughput metric or we are missing something in properly reading the nodetool output - are there any other means of determining the server-side execution throughput ?
Each C* node exposes different sets of JMX metrics, on which you can set up Graphite/Grafana tools for visualizing the information on Graphs, or simpler, gather and show those metrics using JConsole.

1) Setting aside the errors observed, what could be probable cause of throughput not scaling (in fact, degrading) when we add more clients - ie. pumping traffic from multiple nodes as in scenario 2 ?
I have a question before, in scenario 2, the " I am surprised to observe only upto 7 to 8K writes/second " -  does this mean that you observed 8K writes/second for each client? Or in total?

My reasoning - without the insight of having the results of some Connection pools monitoring, as explained above - would be that you would get 8K writes/sec for each client, so instead of your initial total 1x11K writes/sec, you get 3x8K writes/sec. If that is correct, I would say that the bottleneck is not the client, it is on the Cassandra side. And a symptom of that would be observing on the Connection pools, that the connections are getting filled with a lot of inFlight requests. Since the C* cluster resolves requests more slowly than how fast the client sends requests, you would see the number of inFlight requests get higher and higher (more pending requests are queued on the Cluster). And that bottleneck on the C* side may be caused by your data model, and replication configuration + CL configured for requests. Maybe for experimental purposes you could try to set the CL to ONE, just in order to verify if the throughput improves or not. 
Also for comparison, stress testing with a more common tool, like cassandra-stress would give you a little insight on the throughput that can be achieved with your server setup and if the issue comes from the client code.

2) Would retry policy configuration on client-side help in addressing above error or it is indicative of some serious resource issues on the server node ? 
This sort of highlights the potential fact that your cluster may be overloaded by all the requests the clients throws at it. I don't think in that case that retrying is a good solution as it would increase the pressure on the Cluster.

Please don't hesitate to post the results of your monitoring of the pools if you are still seeing those problems, so that we can get a better view on where to improve things.

Thanks!



--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscribe@lists.datastax.com.



--
Kévin Gallardo.
Software Engineer in Drivers and Tools Team at DataStax.

Reply all
Reply to author
Forward
0 new messages