Driver performance

41 views
Skip to first unread message

Stanislav Podlesny

unread,
Jun 24, 2015, 11:37:14 AM6/24/15
to cpp-dri...@lists.datastax.com
Hello,

I recompiled my backend which has used the deprecated driver before with the new one (v2.0.1) and tested the performance.

The results really surprised me. The average performance with the new driver 2.0.1 is about five times slower then when the deprecated driver is being used.

The backend just executes simple SELECT requests (prepared statements are not used.

Michael Penick

unread,
Jun 25, 2015, 11:59:09 AM6/25/15
to cpp-dri...@lists.datastax.com
That's not something I would expect. Any additional information you can provide to help resolve the performance regression would be greatly appreciated. For example:

- The exact schema, queries, and average payload sizes
- A small example of the pathological scenario.
- Are you reusing the same session or creating a new session for each query (or round of queries)?

Mike

To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Stanislav Podlesny

unread,
Jun 25, 2015, 1:52:37 PM6/25/15
to cpp-dri...@lists.datastax.com
Hi Mike,

The database is pretty simple: this is a list of object links. A link is directed and defined by three values
- idfrom (source object ID)
- idto (destination object ID)
- score (link score)

schema:

CREATE KEYSPACE megalink WITH replication = {'class': 'NetworkTopologyStrategy', 'BEMDF1': '2', 'BEMDF2': '2', 'STVAF3': '2'} AND durable_writes = true;

CREATE TABLE megalink.pubmed_pubmed_v1 (
idfrom bigint,
idto bigint,
score int,
PRIMARY KEY (idfrom, idto)
) WITH CLUSTERING ORDER BY (idto ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'tombstone_compaction_interval': '3600', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 3600
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

The backend creates a new session for each query.

- create a new session
- execute one query, there are two possible query types:
1) SELECT * FROM megalink.pubmed_pubmed_v1 WHERE idfrom=12345;
2) SELECT * FROM megalink.pubmed_pubmed_v1 WHERE idfrom IN (12345, 56789);
- close session

The average result size is about 100 rows or less

Thanks,
-Stan

Caragea Silviu

unread,
Jun 26, 2015, 4:07:18 AM6/26/15
to cpp-dri...@lists.datastax.com
Hello Michael,

I'm working on a erlang driver in top of the cpp one you did. And during our load testing (fantastic performance . congratulations !) I found some performance issues into the functions that converts from CassUuuid to string and the other way around. You are using there sprintf and sscanf If I remember well which are very slow.

You can find into our repo 2 versions that run form my test over 15x times faster.
Regarding performance I got is amazing. All the other driver's in erlang I tested I couldn't get more than 15 k requests per second during load test on my cluster. With the one we built in top of the datastax driver we got over 50k and cpu and memory was very low.

Silviu


Michael Penick

unread,
Jun 26, 2015, 11:21:58 AM6/26/15
to cpp-dri...@lists.datastax.com
Thanks for sharing. Would you be up for making a pull request? If not, can I merge your implementation (and credit you as the author)?

Mike

Michael Penick

unread,
Jun 26, 2015, 11:38:28 AM6/26/15
to cpp-dri...@lists.datastax.com
Thanks for the new information, very helpful. It's almost certainly the repeated creation/destruction of the session object. This is much more expensive in the new driver as it's doing a lot more bookkeeping (for the control connection and schema metadata) than the old driver. Some of this overhead will be disable-able in the future to make driver start up much faster for scenarios that require that.

If your application can keep the session around that likely solve the performance issue.

Mike


Caragea Silviu

unread,
Jun 26, 2015, 4:22:32 PM6/26/15
to cpp-dri...@lists.datastax.com
Hello Michael,

I'm really pretty busy with a lot of projects ! Please feel free to merge my changes !
If I find more stuffs I will let you know !

Silviu
Reply all
Reply to author
Forward
0 new messages