Debugging Asytanax -> CQL Write performance with 2.1.9 + C* 2.2.4

Mike Heffner

unread,

Feb 2, 2016, 5:04:41 PM2/2/16

to java-dri...@lists.datastax.com

Hi,

We are working towards migrating a write-heavy application from Asytanax to CQL3 async writes using the Datastax Java driver. In our test, our app is batching about 1,500 column writes in a single Thrift batch and we are comparing that to queuing the same 1,500 writes using executeAsync() and waiting on the futures to complete.

We are running with driver version 2.1.9 and testing against Cassandra 2.2.4. We have six prepared statements that we bind our writes to (using markers). All 1,500 statements are executed, async, prior to fetching them. We are using the TokenAwarePolicy with shuffled replicas (also tried without), with the DCAwareRoundRobinPolicy child policy. The destination ring is 3 nodes, RF=3, each with 32 vnodes. All writes are done at local quorum level.

What we have found during the test when we switch to the CQL write path, is that latency at the app increases 100-200ms+ and CPU on the Cassandra ring nearly doubles. Latency seems a lot more jittery than compared to posting Thrift batches.

Any idea on where to start looking for the performance bottleneck? From what I can understand, the CQL write path in Casandra versions 2.1 (assume that includes 2.2) should be upwards of 150% faster than Thrift.

Cheers,

Mike

--

Mike Heffner <mi...@librato.com>

Librato, Inc.

Jack Krupansky

unread,

Feb 2, 2016, 5:11:15 PM2/2/16

to java-dri...@lists.datastax.com

Is each batch to a single partition? So, you are doing 1,500 / 6 = 250 column writes per prepared statement?

Is each write to a separate CQL row?

Are you only writing to a single column of each CQL row in a given batch?

How many columns are in each CQL row, roughly?

How many CQL rows are in each partition, roughly?

-- Jack Krupansky

--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Alex Popescu

unread,

Feb 2, 2016, 5:14:44 PM2/2/16

to java-dri...@lists.datastax.com

Mike,

Thanks a lot for your very detailed high level description of this investigation. As this is a performance question, I expect the devil is in the details, so I'm pretty sure sharing more details about how you configure the client, how exactly the inserts are run, etc. will help in trying to address your question.

thanks

--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

--

Bests,

Alex Popescu | @al3xandru

Sen. Product Manager @ DataStax

Mike Heffner

unread,

Feb 2, 2016, 5:27:01 PM2/2/16

to java-dri...@lists.datastax.com

Jack,

Sorry, by 1500 columns I meant 1500 unique rows.

Each prepared statement is for a different table, so we are writing 1500/6 => 250 updates per table.

Each insert is to a single row, all tables are marked as COMPAT STORAGE. Each table has 1-3 columns in its composite column key, and a single value.

Each write should be to a unique partition key, but I would have to check on exact spread given the incoming data. Best case about 5% of the partition keys could overlap.

Mike

Jack Krupansky

unread,

Feb 2, 2016, 5:37:04 PM2/2/16

to java-dri...@lists.datastax.com

Try to avoid placing inserts for different partition keys in the same batch - otherwise the coordinator node will simply have to redirect them to a node for that partition key. But that should be no different than for Thrift.

For COMPACT STORAGE, you should have a compound primary key with both a partition key and one to three clustering keys. If you put all three in a "composite partition key" then you defeat the purpose of COMPACT STORAGE. The goal is that each of you single values will be a distinct CQL row which happens to map to a single cell/column in the partition.

-- Jack Krupansky

Mike Heffner

unread,

Feb 2, 2016, 5:46:13 PM2/2/16

to java-dri...@lists.datastax.com

Jack,

Can you explain that first part? For the test against CQL we are preparing each statement individually, writing to a single partition key. We are not using CQL batches as that seems to be discouraged for performance reasons.

This is an example of one of our tables:

CREATE TABLE "Metrics".test_1 (

key text,

column1 bigint,

column2 text,

column3 text,

value blob,

PRIMARY KEY (key, column1, column2, column3)

) WITH COMPACT STORAGE

AND CLUSTERING ORDER BY (column1 ASC, column2 ASC, column3 ASC);

And an example of preparing an insert for this table:

Insert i = QueryBuilder.

insertInto(QueryBuilder.quote(keyspace), "test_1");

i.value("key", QueryBuilder.bindMarker(DATA_KEY_BIND));

i.value("column1", QueryBuilder.bindMarker(MT_BIND));

i.value("column2", QueryBuilder.bindMarker(SRC_BIND));

i.value("column3", FieldNames.JSON);

i.value("value", QueryBuilder.bindMarker(JSON_BIND));

i.using(QueryBuilder.ttl(QueryBuilder.bindMarker(TTL_BIND)));

The other tables are similar in layout.

Mike

Mike Heffner

unread,

Feb 2, 2016, 5:58:24 PM2/2/16

to java-dri...@lists.datastax.com

Alex,

Thanks, yeah I'm sure we are missing something basic. So far it's been a pretty naive port of our application based on piecing together various documentation we could find. Happy to provide any missing details that may help track this down.

Cheers,

Mike

Jack Krupansky

unread,

Feb 2, 2016, 6:00:52 PM2/2/16

to java-dri...@lists.datastax.com

So you were using batches in Thrift, but not in CQL?

You can use batches in CQL, but only as long as you keep the partition key the same (as I said.)

How were you batching them in Thrift?

-- Jack Krupansky

Mike Heffner

unread,

Feb 2, 2016, 6:12:31 PM2/2/16

to java-dri...@lists.datastax.com

Jack,

Yes, we were using batches in Thrift because it allowed us to achieve better insert performance. We didn't start with batches in CQL for performance because it seemed like an anti-pattern from the documentation.

With Astyanax we are constructing a single MutationBatch and adding each of the ~1500 rows to it (across the six tables) and then executing that single batch. We are not doing any splitting by partition key.

With CQL we are executing the 1500 row inserts as individual statements, then waiting on the 1500 ResultSetFutures.

Mike

Jack Krupansky

unread,

Feb 2, 2016, 6:33:37 PM2/2/16

to java-dri...@lists.datastax.com

Which doc were you reading? Was it this statement: "Batch operations that involve multiple nodes are a definite anti-pattern"? That's what I was trying to say - batch is fine as long as all the statements refer to the same partition key.

See:

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatchBadExample.html

I do also see this statement in the doc: "Using batches to optimize performance is generally not successful, as described in Using and misusing batches topic. Instead, using batches to synchronize data to tables is a legitimate operation. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."" and "Batches are logged by default."

See:

http://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html?scroll=reference_ds_djf_xdd_xj__description_unique_15

So, there is the additional overhead of the batch log, but as long as all the statements refer to the same partition key, I suspect that the overhead is less than the overhead of a lot of extra requests.

The question is how many inserts to batch. I don't recall seeing any guidance on that. Whether you should limit it to a few dozen or 50 or even 100 to 250 is unknown. Try it. I would guess that hundreds should be fine, but not thousands. But... who knows, maybe the full 1,500 might work. I'd stick to 100 or so.

-- Jack Krupansky

Mike Heffner

unread,

Feb 2, 2016, 7:04:54 PM2/2/16

to java-dri...@lists.datastax.com

Jack,

Yes, there's that statement, but there are also others like https://twitter.com/patrickmcfadin/status/515536988921810944. I guess that all makes it a bit difficult to decide when or when not to use batching for performance. If it is recommended for performance, maybe the driver could provide methods for building optimized batches by partition key?

In any case, I'm fairly confident there is little overlap in the partition keys in the data set we are testing against. Comparing row count (keyed by partition key) from the Thrift mutation batch with the number of CQL inserts is the same.

Any thoughts on the number of async statements to queue at once? Does performance degrade with a certain number in flight and not acknowledged? (get called on future)?

Mike

Mike Heffner

unread,

Feb 2, 2016, 7:07:52 PM2/2/16

to java-dri...@lists.datastax.com

I would be curious if the source to the benchmarks run here are available: http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster

Trying to replicate that benchmark against our cluster would be useful to see if we get the same results using Thrift batching vs CQL.

On Tue, Feb 2, 2016 at 5:04 PM, Mike Heffner <mi...@librato.com> wrote:

Martin Grotzke

unread,

Feb 2, 2016, 7:32:49 PM2/2/16

to java-dri...@lists.datastax.com

I found docs and blog posts about batches also quite confusing and tried to clarify this with this post: https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/

Cheers,
Martin

Jack Krupansky

unread,

Feb 2, 2016, 7:38:47 PM2/2/16

to java-dri...@lists.datastax.com

I still think you should give at least small to moderate CQL batches a chance. I mean, the primary concern with BATCH is when the partition keys are mixed. The TokenAwarePolicy will use the partition key from the first statement of the batch.

-- Jack Krupansky

Mike Heffner

unread,

Feb 2, 2016, 10:17:10 PM2/2/16

to java-dri...@lists.datastax.com

Alright, but that would seem to imply that there is no way to get comparable performance to mixed partition key batches in Thrift with CQL? If most of your partition keys do not overlap, you'll still end up with N small batches where N == number rows which seems like it would be the same as individual INSERTs.

Mike

Jack Krupansky

unread,

Feb 2, 2016, 10:43:01 PM2/2/16

to java-dri...@lists.datastax.com

In which case of course they should be individual non-batch inserts. I got confused there - I got the impression that they were the same partition key.

-- Jack Krupansky

Mike Heffner

unread,

Feb 2, 2016, 10:57:29 PM2/2/16

to java-dri...@lists.datastax.com

Thinking about this some more, I am wondering if my observations are due to the size of the test ring I'm using. Given a N=3, RF=3 ring, every node is a replica, so posting a Thrift batch of mixed partition keys shouldn't matter as any coordinator is a replica. Therefore, in theory this is largely testing performance of single batched payload vs. parsing multiple prepared statements.

Whereas with a larger ring and maintaining the same RF, the percentage of keys in a mixed-key batch belonging to the coordinator decreases. This would put more work on the coordinator to hand off keys not owned by it versus a client that pushes the keys individually to each of the replicas.

I'd be curious if anyone has data on correlation of ring size and performance inflection.

Mike

Shinta Smith

unread,

Jun 30, 2016, 12:48:14 PM6/30/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Mike,
Have you found anything that might help improve your write throughput/latency?

We are also migrating our write-heavy application from Astyanax to Datastax java driver 3.0.0. Our Cassandra version is old, 2.0.12, we have plans to upgrade it to 3.x but we'll have to upgrade to 2.1.x first. In our preliminary testing, we find out Datastax writes are also about 100-200ms slower. Our Astyanax code is using MutationBatch. Our Datastax code is using individual async inserts, with no batch.

Our Cassandra cluster is larger, 16 nodes with RF=3.

Any kind of tuning you did that you can share with us?

Thanks,
-shinta

Andrew Tolbert

unread,

Jul 1, 2016, 9:48:39 AM7/1/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Hi Shinta,

When it comes to migrating from Astyanax to the Datastax Java Driver, how you implement your solution using the driver can have the opportunity to influence performance more heavily than the change in protocol (thrift -> native protocol/cql). There are things you can do that can negatively impact performance, and there are things you can do that will help optimize.

Would you be able to share any more information about how your astyanax and java driver implementations work? From your comments you mention that you were using mutation batch previously and are now doing individual async inserts without batching, from that I have a few follow ups.

When batching, how many mutations were you doing per batch? Did each batch only have mutations for 1 partition key?
Since you are doing individual async inserts, are you controlling how many concurrent async inserts you are doing at a time? If so, how are you continuing to submit requests when queries complete? Do you submit X queries, wait on all their futures to complete, and then submit another X queries? Or when an individual query completes, are you submitting another one?
What kind of throughput (individual rows/sec) are you achieving w/ the astyanax and java driver solutions?
When you say writes are about 100ms-200ms slower, do you mean Individual writes, or waiting for an entire group of writes to complete?
What do your inserts look like? How many columns are involved, what is the data size, etc?

Thanks,

Andy

Shinta Smith

unread,

Jul 1, 2016, 5:26:11 PM7/1/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Hi, Andy,
Thanks for replying. Here are my answers to your questions:

> 1. When batching, how many mutations were you doing per batch? Did each batch only have mutations for 1 partition key?

Our app is submitted about 100 inserts per batch. No, the batch is not limited to just 1 partition key. In fact, our app code is unaware of partition key.

> 2. Since you are doing individual async inserts, are you controlling how many concurrent async inserts you are doing at a time? If so, how are you continuing to submit requests when queries complete? Do you submit X queries, wait on all their futures to complete, and then submit another X queries? Or when an individual query completes, are you submitting another one?

Yes, we are currently submitted 100 asynch inserts at a time, wait on those futures to complete, before submitted the next 100s.

> 3. What kind of throughput (individual rows/sec) are you achieving w/ the astyanax and java driver solutions?

I was measuring the time it takes for our app to insert each batch of 100 inserts. The following graph shows the "mean" of those times. The low numbers before June 30 20:00 is the time Astyanax took. After June 30 20:00, I switched the code to use Datastax.

> 4. When you say writes are about 100ms-200ms slower, do you mean Individual writes, or waiting for an entire group of writes to complete?

It is the time for each batch of 100 that takes 100ms-200ms slower.

> 5. What do your inserts look like? How many columns are involved, what is the data size, etc?

Our Column Families are simple. Only one column. Our Insert code used Prepared Statements and we made sure we only create one of these in the app. The values are custom-serialized either long or double numeric values. Below is the pseudo code of our inserts:

// this is created in constructor of a singleton
Insert.Options insertNumeric = insertInto(columnFamilyName)
                                   .value(KEY, bindMarker())
                                   .value(COLUMN1, bindMarker())
                                   .value(VALUE, bindMarker())
                                   .using( ttl(bindMarker()) );
PreparedStatement ps = session.prepare( insertNumeric );
ps.setConsistencyLevel( ConsistencyLevel.ONE );

// this is in a method that gets called to process each
// batch
Map<KeyObj, ResultSetFuture> futures = new HashMap<KeyObj, ResultSetFuture>();
for each item in the batch of 100
    BoundStatement bound = insertNumeric.bind( key, timestamp, value, ttl);
futures.put(key, session.executeAsync(bound));

// we wait for each future to complete
for each future in the 'futures' map
    ResultSet result = future.getValue().getUninterruptibly();

Any glaring issues that you can see right off the bat? Any tips is appreciat

Thanks,
-shinta

Avinash G A

unread,

Jul 4, 2016, 12:46:50 AM7/4/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Why are you binding values to insertNumeric (which is a statement) instead ps (which is a prepared statement)? Can you try with :-

BoundStatement bound = ps.bind( key, timestamp, value, ttl);

Instead of :-

BoundStatement bound = insertNumeric.bind( key, timestamp, value, ttl);

Shinta Smith

unread,

Jul 5, 2016, 10:04:21 AM7/5/16

to java-dri...@lists.datastax.com

Sorry that was a copy-n-paste typo. Anyways what I shown you is a pseudo code. The code is using ps.bind().

thanks,

-shinta

Shinta Smith

unread,

Jul 22, 2016, 3:37:25 PM7/22/16

to DataStax Java Driver for Apache Cassandra User Mailing List

Any ideas on how to improve our Insert performance? Any tuning I should look into?

Our Cassandra version 2.0.12, 16 nodes cluster, with RF=3. Datastax version 3.0.0

thanks,
-shinta

What do your inserts look like? How many columns are involved, what is the data size, etc?

BoundStatement bound = ps.bind( key, timestamp, value, ttl);
futures.put(key, session.executeAsync(bound));

Reply all

Reply to author

Forward