Is there any Limit on How many batch statements can be grouped together.

2,544 views
Skip to first unread message

Rajesh Sindhu

unread,
Jan 12, 2014, 10:04:56 PM1/12/14
to java-dri...@lists.datastax.com

In our use case , we are grouping 15 counter queries to execute as a batch .

Is there any ideal number for the number of counter queries to be execute as batch,we should consider for performance or accuracy needs.

Thanks . 

Keith Freeman

unread,
Jan 13, 2014, 8:34:08 AM1/13/14
to java-dri...@lists.datastax.com
We've grouped up to 250 inserts successfully, but note the performance concern here if you're using prepared statements.

Rajesh Sindhu

unread,
Jan 13, 2014, 12:47:59 PM1/13/14
to java-dri...@lists.datastax.com
Thanks Keith,

We are using prepared statements , but i wanted to decrease the no. of
batch execution ,which i can do it by batching 15 PS together,

instead of executing three batch of 5 PS each.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-us...@lists.datastax.com.



--
Thanks
Rajesh Sindhu
Junior Software Engineer
Zscaler Inc.

Keith Freeman

unread,
Jan 13, 2014, 12:59:50 PM1/13/14
to java-dri...@lists.datastax.com
Ok, but consider the performance implications.  As my original post describes, for our prepared-statement use-case, building and executing 17 batches of 10 inserts was enormously faster than building and executing 1 batch of 166 inserts for the exact same data.

George Li

unread,
Feb 13, 2014, 4:51:30 PM2/13/14
to java-dri...@lists.datastax.com
Hi,

I like to share some of my test results with you guys. I am using java driver 2.0.0-rc2, Cassandra 2.0.3. I have a 3 node cluster. For batch statements, I am using com.datastax.driver.core.BatchStatement and include com.datastax.driver.core.PreparedStatement in it. I am doing insertions and deletions of 12,000 rows using different batch size and here is the result in milliseconds:

Insertion of 12,000 rows:
Batch Size        Time (ms)
10                     55561
100                   12718
1000                  6675

Deletions of 12,000 rows:
Batch Size        Time (ms)
10                     84969             
100                   25488
1000                 2579

As you can see, increasing batch size gives huge performance gain. I did not go above 1000 batch size.

Thanks.


On Sunday, January 12, 2014 8:04:56 PM UTC-7, Rajesh Sindhu wrote:

Rick Bullotta

unread,
Feb 13, 2014, 5:39:56 PM2/13/14
to java-dri...@lists.datastax.com
Very interesting results.  

A few questions:

- how wide are your rows/# of columns in each row?
- what is the network topology?  are the 3 nodes all co-located or geographically distributed?

It seems that the batch capabilities are extremely important from a performance perspective and probably deserving of significant ongoing attention!




To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Li, George

unread,
Feb 13, 2014, 5:46:36 PM2/13/14
to java-dri...@lists.datastax.com
Rick,

- my rows are not very wide. The table is created as follows:
CREATE TABLE IF NOT EXISTS MyAssociations.Associations (
id ascii,
statusfilter ascii,
authgroupidfilter ascii,
classtype ascii,
sortkey varchar,
associd ascii,
status ascii,
assocblob text,
authgroupid ascii,
acceptancedate timestamp,
createdate timestamp,
updatedate timestamp,
PRIMARY KEY (id, statusfilter, authgroupidfilter, classtype, sortkey, associd)
);

- All 3 nodes are in the same region in AWS. They are all m1.medium VMs.

Thanks.

Paul Felby

unread,
Feb 14, 2014, 2:33:50 AM2/14/14
to java-dri...@lists.datastax.com
Hi,

Just to add to the batch discussion, I have been doing batches of 5000 in a similar structure table and found that to be a huge performance gain.

I have my batch size configurable, and am inserting up to around 100,000 items, if people like I can post results from various batch sizes.

It is also all in aws same region, if you are inserting over a network with some degree of latency, then of course, batches will help even more. 

For my setup, I found M1.Large machines to be in a constant yellow (op sys load too high), X large was fine, but I am now using 6 x c3.2xlarge

Cheers,
Paul

Harshad Vyawahare

unread,
Apr 22, 2015, 11:24:44 AM4/22/15
to java-dri...@lists.datastax.com
I am using driver core version 2.1.3 and I was not able to insert more than 6000 records in a single batch.

Rick Bullotta

unread,
Apr 22, 2015, 11:26:47 AM4/22/15
to java-dri...@lists.datastax.com

Overall, that seems like ridiculously slow performance.  How many columns in the rows?  It seems almost two orders of magnitude slower than I’d expect.

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Harshad Vyawahare

unread,
Apr 22, 2015, 11:30:06 AM4/22/15
to java-dri...@lists.datastax.com
there were just 3 columns. 1 string, 1 bigInt and 1 blob

Rafael Balest

unread,
Apr 23, 2015, 8:44:08 PM4/23/15
to java-dri...@lists.datastax.com, harshadv...@gmail.com
Use spring data, compatibilize to many database(MongoDB, Cassandra,
JPA...) and easy developer.

http://docs.spring.io/spring-data/cassandra/docs/1.0.5.RELEASE/reference/html/cassandra.core.html

Harshad Vyawahare

unread,
Apr 27, 2015, 5:45:57 AM4/27/15
to java-dri...@lists.datastax.com, harshadv...@gmail.com
Our table has many columns, in a query we fetch only certain columns, I guess spring-data-cassandra fetches all the columns from the DB and then returns whatever the user needs, so we used cassandra core to reduce the size of fetched data.
Reply all
Reply to author
Forward
0 new messages