Any way to get the total size of a batch statement

1,269 views
Skip to first unread message

Clint Kelly

unread,
Dec 9, 2014, 2:47:01 PM12/9/14
to java-dri...@lists.datastax.com
Hi all,

How can I easily get the total size (in bytes) of a batch statement (or statement in general)?

I am getting warnings in my Cassandra server logs like the following:

WARN [Native-Transport-Requests:1276476] 2014-12-02 09:18:10,318 BatchStatement.java (line 223) Batch of prepared statements for [kiji_neiman.lg_products_B] is of size 53820, exceeding specified threshold of 5120 by 48700.

I assume the above message means that my batch statements should not be larger than ~5 KB, but I have some that are ~53 MB!  I think these are occurring in some MapReduce code that I wrote and I want to log the sizes of all of the batch statements that I am sending.

These big batch statements are causing GC problems in Cassandra that are affecting other parts of my system in a bad way!  :)

Any advice on the best way to estimate the size of my batch statement?  Right now I am buffering 100 statements before writing to Cassandra.  I'd like to change this to just buffer until I hit a certain max size.

Best regards,
Clint

Ryan Svihla

unread,
Dec 9, 2014, 3:07:14 PM12/9/14
to java-dri...@lists.datastax.com
Clint, 

I suggest not using batch to bulk load data, and only to use it for updating related records to different tables or for unlogged batches to the same table if they're in the same partition key. Execute async (with some cap on number of writes in flight if you're doing large updates) with futures will most likely give you a huge speed up and save a lot of load on the Cassandra Cluster.

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.



--

datastax_logo.png

Ryan Svihla

Solution Architect


twitter.png linkedin.png

DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Clint Kelly

unread,
Dec 9, 2014, 3:35:10 PM12/9/14
to java-dri...@lists.datastax.com
Hi Ryan,

Thanks a lot!  I had actually just come across the same article and I'm going to change it now.

We were translating some code that had previously been written for HBase, where the batched writes were a best practice.

Thanks again.

Best regards,
Clint

Reply all
Reply to author
Forward
0 new messages