Hi, I'm starting a new project with cassandra and thought I'd use the recommended cassandra-driver, but the performance seemed a bit low. I then repeated testing with pycassa and found much higher performance. I understand that the cassandra-driver specifically mentions in the docs that some work is still underway (e.g. "C extension for encoding/decoding messages" is on the todo list), so my question is mostly just to see if the results I'm seeing are expected or if maybe I'm just doing something wrong.
For a single connection, single threaded test that just queries an object by its primary key over and over, I get:
== pycassa 1.11.0 get test (20000 gets) ==
5179 reqs per second
== cassandra-driver 1.1.2 get test (20000 gets) ==
767 reqs per second
To be clear, I had similar performance differences trying various levels of concurrency (though not as wide a gap - maybe 2x-3x in favor of pycassa). But since the performance gap also exists in the single worker model, I thought I'd post that for the sake of simplicity.
My test is as follows:
import timeit
count = 2000
key = '12341234'
import pycassa
print '== pycassa %s get test (%d gets) ==' % (pycassa.__version__, count)
from pycassa.pool import ConnectionPool
conn = ConnectionPool('t0')
cf = pycassa.ColumnFamily(conn, 'sessions1')
cf.insert(key, {'age':55})
t = timeit.timeit('cf.get(key)', 'from __main__ import cf, key', number=count)
print '%d reqs per second' % (count/t)
import cassandra
print '== cassandra-driver %s get test (%d gets) ==' % (cassandra.__version__, count)
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
session.set_keyspace('t0')
put = session.prepare('update sessions1 set age=? where user_id=?')
session.execute(put, [55, key])
get = session.prepare('select * from sessions1 where user_id=? limit 1')
t = timeit.timeit('list(session.execute(get, [key]))[0]', 'from __main__ import session, get, key', number=count)
print '%d reqs per second' % (count/t)
I'm running it on a fairly idle Ubuntu 13.04 server, and the magnitude of the performance difference is consistent across many runs. Cassandra is running as a single node, locally, with a pretty vanilla configuration (DataStax Community version 2.0.7).
I know the test is somewhat convoluted, but here's some background: we're looking to replace our existing k/v store with Cassandra, and out of the gate we'll need to deploy a cluster with capacity for 300k rps or so. Before we get all the way to production our testing will move to something much more similar to our existing production traffic, but at the outset I'm just trying to evaluate feasibility and get a kind of best-case performance baseline.
Using pycassa and multiple clients I can easily get tens of thousands of requests per second from this local node, so even though more realistic usage will result in lower performance, it looks like we're probably in good shape and can likely reach our performance needs with a cluster of a few dozen nodes.
On the other hand, with cassandra-driver and multiple clients I can rarely get more than a few thousand requests per second. Even if we see no worse performance with real traffic patterns (which is pretty unlikely), I might need a hundred or more nodes in our cluster to achieve our performance needs.
So, what I'm looking for is some guidance, i.e. if I'm using the newer driver improperly or there's another way to go about it, pointers would be greatly appreciated. Or, if this type of performance is typically for the newer driver for now, that's totally ok - I can stick with pycassa for now and then re-evaluate the newer driver as it progresses.
FWIW, nearly all our traffic is in the form of simple read/write by ID (we have some background process that will occasionally scan the data for reporting purposes, but the live traffic is all very simple k/v access) and our real world object sizes are < 10KB, with 2KB being the norm.
Thanks for any guidance anyone can provide!
-Dave