CreateTable performance difference from previous driver

78 views
Skip to first unread message

Marco Scoppetta

unread,
Aug 28, 2019, 1:03:52 PM8/28/19
to DataStax Java Driver for Apache Cassandra User Mailing List

Hello everyone,

I have Cassandra 3.11.4 running on my machine, 

trying the same operation of creating a new table in a newly created keyspace 

I have noticed a >10x performance difference between driver version 3.7.2 and 4.2.0.

Here is my code using Java driver version 3.7.2:

Create createTable = createTable(keyspaceName, tableName)
 
.ifNotExists()
 
.addPartitionKey("key", DataType.blob())
 
.addClusteringColumn("column1", DataType.blob())
 
.addColumn("value", DataType.blob());

session
.execute(createTable);


And this is the code used with Java driver version 4.2.0:

CreateTableWithOptions createTable = createTable(keyspaceName, tableName)
 
.ifNotExists()
 
.withPartitionKey("key", DataTypes.BLOB)
 
.withClusteringColumn("column1", DataTypes.BLOB)
 
.withColumn("value", DataTypes.BLOB);

session
.execute(createTable.build())


Using 3.7.2 I get an average time to create a table that is ~    80ms
Using 4.2.0 I get an average time to create a table that is ~1050ms

Any idea or pointer on what I might be doing wrong here? Or this is a known fact?

Thank you very much!

Marco

Olivier Michallat

unread,
Aug 28, 2019, 5:21:28 PM8/28/19
to java-dri...@lists.datastax.com
Hi,

Yes, I reproduce it. This doesn't come from the query builder, I benchmarked that independently and driver 4 is actually slightly faster on that part.

So the gap is on the session.execute call, and the culprit is schema event debouncing. I ran the driver 4 test with a smaller debouncing window and the numbers improve immediately. Let me check why the query gets debounced in driver 4 and not driver 3, and if it's something we should fix, and I'll report back.

--

Olivier Michallat

Driver & tools engineer, DataStax



--
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Olivier Michallat

unread,
Aug 28, 2019, 6:30:57 PM8/28/19
to java-dri...@lists.datastax.com
So actually I would argue it's a bug in driver 3.

We debounce to avoid refreshing the metadata too often: whenever the schema changes, the driver needs to update its local representation of the schema (more details here). This means querying system tables on the control connection, and parsing the results. If you have a lot of keyspaces and tables, that can get pretty heavy. So if there are rapid changes happening in succession, we want to group them into a single refresh.

There are two ways we can find out that the schema has changed:
  1. the server sends us an event on the control connection. This is when the update was made from another client.
  2. we get a SCHEMA_CHANGE message in response to a query. This is when the update comes from within your application. In that case, the driver only completes the session.execute call after the schema refresh (that way we guarantee that the effects of the query are immediately visible after session.execute returns).
As it turns out, driver 3 only debounces in the first case. In the second case, an immediate schema refresh is always scheduled (if another refresh is already running, it will run after it).

Driver 4 debounces in all cases. I think this is a more consistent behavior:
  • if you execute a CREATE TABLE, and a server event (from an unrelated change by another client) arrives at the same time, we get a chance to group them into a single refresh.
  • if you execute multiple CREATE from different application threads -- or successive session.executeAsync calls -- we get a chance to group them into a single refresh.
But, as you've observed, the downside is that you get a bad performance hit on a single query. There are a few ways to address that:
  • tune the debouncing window to find the right balance between sufficiently grouping close events (longer window), and not penalizing isolated updates too much (shorter window). You can even set the window to 0 to disable debouncing, if you think schema updates are rare enough in your environment.
  • if you can group your schema updates in a single place in your application, disable the metadata temporarily with session.setSchemaMetadataEnabled(false), run all your statements, and then reenable it at the end (this will trigger a refresh). There won't be any debouncing while the metadata is disabled, and you're in control of when the refresh will happen.
I'm also going to create a doc ticket to mention this in the upgrade guide.

--

Olivier Michallat

Driver & tools engineer, DataStax


Marco Scoppetta

unread,
Aug 29, 2019, 5:04:25 AM8/29/19
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Oliver,

thank you very much for the detailed response,
it is very insightful indeed!

I have tried changing the advanced.metadata.schema.debouncer.window config and I can clearly see the difference.

Having a similar explanation in the upgrade guide will be very beneficial for the users I think.

Thanks,
Marco
To unsubscribe from this group and stop receiving emails from it, send an email to java-dri...@lists.datastax.com.
Reply all
Reply to author
Forward
0 new messages