We are migrating from Cassandra V1.2.19 to V2.1.1, which requires a new cassandra driver(
https://github.com/datastax/python-driver). It’s under the way and we’re currently having some issues when running Celery with the new Cassandra driver.
So we create a wrapper class `CassandraConnection` over the new driver and delegate calls to the driver. We’re using this wrapper in our WSGI app and also in celery tasks:
@property
def cassandra_conn(self):
if not self._cass_conn:
self._cass_conn = self._create_cassandra_connection()
return self._cass_conn
def _create_cassandra_connection(self):
_cass_conn = None
if self.config['CASSANDRA']['KEYSPACE']:
kwargs = self.config['CASSANDRA']['CONNECTION_ARGS']
_cass_conn = CassandraConnection(
self.config['CASSANDRA']['HOSTS'],
self.config['CASSANDRA']['KEYSPACE'],
**kwargs
)
return _cass_conn
Basically the Cassandra session is established once and reused. The app works fine with the new driver, but Celery seems to have some memory issue. When running Celery, it requires that we monkey patch our code using gevent monkey patch as we’re leveraging the Cassandra.io.geventreactor of the new driver to connect to Cassandra. To do that, we add `–p gevent` when starting celery:
celery worker -Q foo_queue g -A
foo.jobs -E --maxtasksperchild 25 -P gevent
It worked well at the beginning, however after we ran some tests against celery, it started to eat up a lot of memory and finally crashed. This never happened when we’re using the old Pycassa driver. So we suspect if it’s something related to the new driver.
I know this is not very regarding Cassandra , is there’s anything we need to watch out when using the new Cassandra driver especially with gevent patched? Thanks!