Cassandra inserts on single node

451 views
Skip to first unread message

prathamesh saraf

unread,
Sep 2, 2016, 1:33:51 PM9/2/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Hi , I am trying to profile cassandra on a single node cluster to see how much one node can handle inserts and then add more node as per this result.

I have changed certain parameters in cassandra.yaml. They are as follows.

memtable_offheap_space_in_mb: 4096
memtable_allocation_type: offheap_buffers
concurrent_compactors: 8
compaction_throughput_mb_per_sec: 32768
concurrent_reads: 64
concurrent_writes: 128
concurrent_counter_writes: 128
write_request_timeout_in_ms: 200000

Cassandra node: JVM heap size 12GB

I have added these parameters to the cassandra C++ driver APIs
cass_cluster_set_num_threads_io(cluster, 8);
cass_cluster_set_core_connections_per_host(cluster,8);
error = cass_cluster_set_write_bytes_high_water_mark(cluster,32768000);
error = cass_cluster_set_pending_requests_high_water_mark(cluster,16384000);

With these parameters i get a write speed of 13k/sec with data size of 1250 bytes.
I wanted to know am i missing out on anything in terms of parameter tuning to achieve a better performance.

Cassandra DB node details:
VM
CentOs 6
16GB RAM
8 cores. And is running on a separate box from the machine i am pumping data.


Michael Penick

unread,
Sep 2, 2016, 2:17:23 PM9/2/16
to cpp-dri...@lists.datastax.com
I'm not the best resource for Cassandra tuning so you might want to ask the Cassandra mailing list. The driver tends to be bottlenecked by Cassandra, but that depends on the size of your Cassandra cluster and the type of hardware used for both the client and the Cassandra nodes. I've seen a single driver CassSession handle anywhere from 50k to 300k requests/second running against a ~10 Cassandra node cluster, but that depends on the sizes of the request and response messages. These are really rough numbers so take with a grain of salt.

I achieved those numbers by batching 5k-10k requests (using 5k-10k outstanding CassFutures and not CassBatch) at a time.  I used "NumberOfCores - 1" IO workers and 1-2 connections per host. Unintuitively, Increasing the number of connections doesn't always yield more performance because having more connections fights the driver's ability to group writes increasing the number of system calls. Also, each connection can handle 32k simultaneous requests. 

cass_cluster_set_num_threads_io(cluster, 7); // You might want to leave some cores for your application
cass_cluster_set_core_connections_per_host(cluster,1); // More connections != more performance

Mike





--
You received this message because you are subscribed to the Google Groups "DataStax C++ Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-user+unsubscribe@lists.datastax.com.

prathamesh saraf

unread,
Sep 2, 2016, 3:02:08 PM9/2/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
> To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Thanks Mike for the insight, I have made the necessary changes that you suggested but its pretty much the same. I'll ask for the performance tuning in the mailing list.

Michael Penick

unread,
Sep 2, 2016, 3:34:24 PM9/2/16
to cpp-dri...@lists.datastax.com
If your data set is bigger than memory than performance is going to be limited by disk I/O so would include the type of drives and layout (e.g. is your commit log on different physical drive than your data directories) in your Cassandra node details. Here's some ballpark numbers that can be achieved with fairly beefy machines with SSDs: https://www.instaclustr.com/blog/2015/08/20/significant-performance-improvements-with-cassandra-2-1/ (look at the "Insert Only" throughput). They're using a 3 node cluster with a replication factor of three (and a consistency level of quorum) so that's very roughly the same as a single node test because every node is writing data for each request (but only 2 have to respond back to the client).

Mike

To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-user+unsubscribe@lists.datastax.com.

prathamesh saraf

unread,
Sep 7, 2016, 3:05:40 PM9/7/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
> To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.


Hi I added another node to improve the performance but i did not observe any performance improvement. I am missing something.. Is there any values to set on the driver side to load balance between the two node. ?

Michael Penick

unread,
Sep 7, 2016, 4:03:50 PM9/7/16
to cpp-dri...@lists.datastax.com
The driver automatically balances load between multiple nodes. What's your replication factor and are your partition keys well distributed across both nodes?

To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-user+unsubscribe@lists.datastax.com.

prathamesh saraf

unread,
Sep 7, 2016, 4:39:37 PM9/7/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
> To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

I did not do any modifications in terms of partition keys per say. I am using the default partitioner.
partitioner: org.apache.cassandra.dht.Murmur3Partitioner.


This is on both the nodes. My replication factor is 1 and key space has SimpleStrategy

My primary key is timeuuid () for the table

prathamesh saraf

unread,
Sep 7, 2016, 4:55:30 PM9/7/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Also i have set just one contact point for the cluster. Is this fine ? Out of curiosity, how does the driver come to know about the other nodes in the cluster ?

Michael Penick

unread,
Sep 7, 2016, 5:01:23 PM9/7/16
to cpp-dri...@lists.datastax.com
Yes, the other nodes will be discovered by the driver.

To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-user+unsubscribe@lists.datastax.com.

prathamesh saraf

unread,
Sep 9, 2016, 1:52:07 AM9/9/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
> To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

I have made some observations.

If i disable durable writes, i see an increase in the performance of insert operations, I reckon this is because of the commit logs being not written on disk and the time on disk operation is saved.

If i increase the size of the meltable threshold after which a flush to the SS table happens, I see a slight improvement in the performance. This is capped by the available RAM on the system after JVM heap size allocation.

If i reduce the number of column values to be inserted, i see an increase in the write speed since i believe the amount of data to be written becomes less.

Also in terms of the partition key , after changing the primary key to uuid from timeuuid i observed better performance. I got unto 20K/sec inserts on my two node cluster (2 fold) increase. However, after adding a third node i see it caps at 26 ~ 27K/sec it should have increased it linearly going by the two node scenario. I guess the problem would be the partition key. I read on blog post if i need the data to be evenly distributed, the pk has to be as random as possible, which is the case for an uuid column type. Are there any other pointers you can share regarding the same?

Please bear with me as I am new to cassandra. Your responses have been prompt and i really appreciate it :) .. Thanks

Ketan Mayani

unread,
Sep 17, 2018, 3:31:10 AM9/17/18
to DataStax C++ Driver for Apache Cassandra User Mailing List
> To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Hi Michael,

What was the request size in bytes for above testing?

actually i am trying to measure insert speed on single node.

if you have benchmark per node then please share.

Thanks & Regards,
Ketan Mayani
Reply all
Reply to author
Forward
0 new messages