Best strategies to get maximum performance out of cassandra driver

145 views
Skip to first unread message

Bhuvan Rawal

unread,
May 18, 2016, 12:54:16 PM5/18/16
to DataStax Python Driver for Apache Cassandra User Mailing List
Hi All,

Im using Datastax Python driver for quite sometime now and it works pretty well. There are certain batch jobs which require large amount of data manipulation, reads and writes and we have achieved the same using Multiprocessing. I was observing that on peak write only load the processes were taking 15-20% cpu. I deduced that its because of IO wait time (I was using unlogged batches of size 30 and session.execute and not session.execute_async).

For Writes : I learnt that execute_async would be way faster esp when GIL is not an issue here (CPU is free). I would like to know if we have a documentation for retry mechanism in cased of error callbacks.

For Paged Reads: Also I tried PagedResultHandler for reading from Cassandra, for reading 1.5 Million rows it took 19 secs using PagedResultHandler and 17 secs using the basic paging. Can this result be validated? The code that I used can be found here - http://pastebin.com/GD91TDUs. Please correct me in case of a mistake in approach.

What could be the best alternatives to read from cluster really fast, say a full table scan. (Say Some kind of token slicing between distributed clients for specific cassandra nodes and operating concurrently.).

Best Regards,
Bhuvan
Reply all
Reply to author
Forward
0 new messages