How to efficiently use c# driver to make multiple async calls to bunch of tables?

278 views
Skip to first unread message

Check Peck

unread,
May 22, 2020, 2:34:28 AM5/22/20
to DataStax C# Driver for Apache Cassandra User Mailing List
I am working with Scylladb/Cassandra where I am getting some stuff out of it using Datastax C# driver.

Below is my code which interacts with Scylladb and in my GetAsync method I am extracting stuff from three different tables. We did some performance testing and it looks like we are seeing performance issues while getting data out of the db on the client side but on the server side latencies looks normal.

Earlier we were using IN clause queries and we found out that performance is not good with that so we change the client code to do multiple async calls in parallel and now server side latencies looks good but client is looking bad now so I am assuming I am not doing it correctly.

Here is my code for the reference - https://pastebin.com/fJiHX3Tx

I am using latest datastax c# driver to interact with scylla/cassandra.

Cluster Info:

- We have a 6 nodes cluster all in one dc with RF as 3.
- We read/write as local quorum.
- And we have row caching enabled as well.

Few questions:

- Is there anything wrong with the way I am making connections to scylla/cassandra cluster in the constructor? Any settings I have used it in a wrong way or anything I should add to improve the performance?
- Also is there anything wrong in my GetAsync method which can cause some performance issues? Because I am doing few things in that method to get all my data from three different tables by doing multiple async calls so maybe I messed up something there and wrote in a bad way which can cause performance issues?

Joao Reis

unread,
May 25, 2020, 7:27:10 AM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
Hi,

Looking at the code, the only thing that stands out is the SetMaxConnectionsPerHost call... Try leaving it unconfigured (default). You could also try using ConfigureAwait(false) on every await.

What do you mean by performance issues? What latencies are you observing on the client and on the server? What is the average size of the clientIds list on the GetAsync method?

Do you have C# driver metrics enabled? There is a timer metric per session and timer metrics per node per session.

What is the throughput on your performance tests? There are a lot of expensive LINQ calls like ToDictionary, ToList, ForEach that could be problematic under heavy load.



--
You received this message because you are subscribed to the Google Groups "DataStax C# Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csharp-driver-u...@lists.datastax.com.
To view this discussion on the web visit https://groups.google.com/a/lists.datastax.com/d/msgid/csharp-driver-user/CAF%2BA%3D7rUjjbqazG_Mz4BujBJVG%3Dru5rcE3TTDjBDjkA2ZMmqMg%40mail.gmail.com.

Check Peck

unread,
May 25, 2020, 1:07:28 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
Average size of  clientIds is around 100. I wasn't aware we can enable C# driver metrics also. Can you provide an example on how can I do that both with metric per session and metrics per node per session? I will look into expensive LINQ calls.
Throughput is not very high as we do only 1000 calls per min and each call has 100 clientIds.

Joao Reis

unread,
May 25, 2020, 1:11:01 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List

Check Peck

unread,
May 25, 2020, 1:19:12 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
Cool, will take a look. I forgot to ask you one thing - Do you think we should limit our async calls we are making in parallel? It might be putting load on cassandra since we have 100 clientIds at a time for each request so making so many async calls in parallel for each request might not be good. What do you think?

Alex Ott

unread,
May 25, 2020, 1:21:59 PM5/25/20
to csharp-dr...@lists.datastax.com
Cassandra protocol supports sending of multiple requests per connection to host. IIRC, by default in C# driver it's 2048 in-flight requests. Usually the requests should be also spread between different connections if requests are done against different partitions.



--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Joao Reis

unread,
May 25, 2020, 1:35:59 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
In general, we recommend users to limit the number of async calls in parallel so it is definitely something to consider. I would enable metrics first to see if there is something that stands out (latencies being very different from what you see on the server or on the test results).

You could also try increasing the number of core connections to see if that helps (I have no idea if this is a good recommendation for Scylla, with Cassandra there are some cases where it's better to increase this number).

Increasing the maximum number of connections doesn't really tell you how many connections the driver is using (metrics will show this as well).

Check Peck

unread,
May 25, 2020, 3:05:02 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
So I was going through the link you shared earlier about metrics. It looks like it will measure how much time only query is taking right? For example below query as per example in the wiki:

SELECT * FROM system.local

What about the overall code performance in GetAsync method -  meaning if we are launching too many async calls for each clientId then this won't reflect that right on graffana dashboard or let's say if our throughput is high as well on GetAsync method?
Correct me if I am wrong - I believe this is gonna measure individual query performance?

Joao Reis

unread,
May 25, 2020, 4:38:27 PM5/25/20
to DataStax C# Driver for Apache Cassandra User Mailing List
Yes, the driver metrics only measure time spent between the moment that the driver receives an individual request and the moment it receives its response. If you're more interested in measuring the latency for a particular method on your application code call then the driver metrics won't help.
Reply all
Reply to author
Forward
0 new messages