pyramid and cassandra. How to glue them together?

Rafael Riedel

unread,

Oct 7, 2014, 5:53:05 PM10/7/14

to pylons-...@googlegroups.com

Hi guys!

I'm new to Pyramid, and I'm trying to develop an application using cassandra, and to be honest, I'm lost!

My biggest issue is how to keep the session alive for the entire application. Anyone have any success case using Pyramid and Cassandra?

Thank you all!

Jonathan Vanasco

unread,

Oct 8, 2014, 3:35:12 PM10/8/14

to pylons-...@googlegroups.com

I don't have an exact answer, but I can give you a starting point -

It's my understanding that with the datastax driver/api, there is a large initial connection hit, and most people want to re-use the same same Session across multiple requests to minimize that. It looks like a popular strategy is to use a @postfork hook to set up the connection in each worker.

Depending on how you're going to deploy (uwsgi, gunicorn, etc) there are slightly different strategies to initializing a session/connection pool for each worker. that sort of stuff isn't really pyramid specific, so there is a lot of information on it.

Not an exact answer, but I was looking into this a few months ago and that's as far as I got.

Message has been deleted

Taylor Gronka

unread,

Oct 9, 2014, 3:51:31 PM10/9/14

to pylons-...@googlegroups.com

Howdy Rafael. It's not so hard. here's something like my folder structure, using the datastax driver:

/www/mywebsite/cassandraconnection/simpleclient.py
# datastax imports
class SimpleClient():
    session = None
    def connect(self, nodes, certificate):
        # some certification stuff here
        cluster = Cluster(nodes, protocol_version=2, cql_version='3.1.7', port=9042, ssl_options=ssl_options)
        self.session = cluster.connect()
        self.session.row_factory = dict_factory
        # i like using the dict_factory - check out the other options if you want though.
        # but if you don't use the dict_factory then you might have to zip your own results from cassandra, and results will be formatted in different ways - that is, a single row will return as a list of dict items iirc, while multiple rows will return as a list of dicts, but that list is the first item of a list ( a list wtihin a list).. just kind of awkward results

    def close(self):
        self.session.cluster.shutdown()
        self.session.shutdown()
        log.info('Connection closed.')

/www/mywebsite/pyramidstuff/models.py
from cassandraconnection import SimpleClient
Session = SimpleClient()
# because this is imported into the __init__.py, this creates the SimpleClient() object when your code runs - but note that it does not connect yet - it just sits there

/www/mywebsite/pyramidstuff/__init__.py
# import the SimpleClient code to expose it
from cassandraconnection import SimpleClient
# import the simple client object
from .models import Session

# because i use uwsgi, i run multiple 'forks' of cassandra. if you don't fork here, you'll have clashes between your multiple instances of a single connection to the database clashing. this step in effect creates a separate connection for each client
from uwsgidecorators import *
@postfork
def connect_cassandra_client():
    CaSession.connect(['127.0.0.1'], certificate='/path/here')
    print("connection to cassandra made")

# the new version of cassandra highly recommends a clean shutdown, i think
import atexit
@atexit.register
def shutdown_cassandra_client():
    Session.close()
    print("cassandra conn closed")

having said all that, the new version of cqlengine uses the datastax driver as a backend, which might be a lot easier to work with - they made that change in june/july iirc. However, Cassandra updates very, very often, and I'm doing some unique stuff with it, so I chose not to use cqlengine, although chances are it's what you want to use.

Oh actually the crummy part is waitress doesn't work well with the datastax driver for some reason - you'll probably get weird errors and disconects. I changed to nginx + uwsgi, which isn't very hard to do at all with pyramid. In the simplest case, set up nginx, and just add a [uwsgi] section to your development.ini with the necessary things, and use:

$VENV/bin/pip install uwsgi
$VENV/bin/uwsgi --ini-paste /www/mywebsite/pyramidstuff/development.ini

I'm writing parts of this off memory, and trying to summarize the code I am looking at, so excuse my mistakes please, but it should give you an idea of a method that works.

use from .models import Session to access the connection, and from there I either write code in classes which write directly to cassandra, or for a select few functions I import SimpleClient.py to do like simpler things. I have to go so I can't finish this gl tho

Rafael Riedel

unread,

Oct 23, 2014, 9:27:45 PM10/23/14

to pylons-...@googlegroups.com

Hi Jonanthan!

Thank you for you answer! You gave me the insight to implement a better session manager for cassandra. I'm proceeding with my code and as soon as I have a functional code, I'll post here.

Rafael Riedel

unread,

Oct 23, 2014, 9:34:23 PM10/23/14

to pylons-...@googlegroups.com

Howdy Taylor!

Sorry about the delay to answer your message, I was busy with another project... :(

You answer is simply amazing! I think this is what I'm looking for, and Jonanthan is right about the @postfork. Right now I'm trying using the cqlengine instead of pure datastax driver. I'll be starting coding right now and I hope in a couple of days have this correctly implemented. Stay tuned!

Thank you very much! You helped me alot!

Taylor Gronka

unread,

Oct 24, 2014, 1:04:05 AM10/24/14

to pylons-...@googlegroups.com

Yeah, Jonathan was headed in the right direction. I'm a bit wary of my own coding practices, but I wanted to get what worked for me out there somewhere, and this seemed like a convenient place for it.

Another point that comes to mind is that as I'm developing, I often drop all the keyspaces and recreate them. However, I've noticed that sometimes, maybe 30% of the time, uwsgi will continue to return data that was in the dropped dataset. To fix this I'll restart the uwsgi server... I'd guess it has to do with Cassandra clearing it's heap space related to a connection.. but I haven't gotten to the point to dig into Java much.

Aside from that and what I mentioned picking a row_factory, working with Cassandra has gone pretty smoothly - hope it works for you.

Just to mention a few more things that confused me starting out: I would recommend anyone considering using Cassandra read ebay's data modeling best practices guides - but keep in mind MOST of the articles about Cassandra don't apply to version 2+. A real big one was that "wide rows" don't really exist anymore - but you can achieve the same effect with a layer of composite columns. I think one draw to Cassandra is row slicing, but now it's composite-column slicing. Note that you can't slice across the primary key, unless you use one of the 'ordered' partitioners, which is not recommended.

Reply all

Reply to author

Forward