Describing Cassandra Tables Python

1,804 views
Skip to first unread message

Aaron Benz

unread,
May 18, 2015, 2:06:23 PM5/18/15
to python-dr...@lists.datastax.com
I'm looking through the object mapping sections and I am wondering if there is a way to find out what columns have what properties. That is, after connecting to a cluster, can I create a table object x that is test.x. And then, from python simply go, "what are the static/primary/partition/clustering columns in x?" I could figure this out from doing a describe tables from cqlsh, but I was wondering if there was something already like this?

Adam Holmberg

unread,
May 18, 2015, 5:17:43 PM5/18/15
to python-dr...@lists.datastax.com
Aside from the 'private' attributes of the model itself, the core driver has a schema metadata model that is generated from the system tables. cqlsh actually generates the describe output from this model.

The model is held in Cluster.metadata
Described in cassandra.metadata

Regards,
Adam Holmberg

On Mon, May 18, 2015 at 1:06 PM, Aaron Benz <aaron....@gmail.com> wrote:
I'm looking through the object mapping sections and I am wondering if there is a way to find out what columns have what properties. That is, after connecting to a cluster, can I create a table object x that is test.x. And then, from python simply go, "what are the static/primary/partition/clustering columns in x?" I could figure this out from doing a describe tables from cqlsh, but I was wondering if there was something already like this?

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Aaron Benz

unread,
May 21, 2015, 12:11:40 PM5/21/15
to python-dr...@lists.datastax.com
Ok, so I've dug deeper into the code and I think I know have a pretty good understanding of how to dynamically get to all keyspaces, their tables, and their columns. However, when I got down to the TableMetadata and ColumnMetadata classes, I've run into a bit of a mess. It seems that the ColumnMetadata class actually lacks a lot about the column itself. For example, it does not contain the "type" as in "partition_key", "clustering_key" etc... It does have a is_static variable, but that doesn't seem nearly as helpful as having a type field. Additionally, although the column itself does not have a "type," I could find out the partition keys by the list contained at the TableMetadata level. However, the fact that it is not built and acted upon according to the columns contained in the table does not quite make sense to me.

So, I have been trying to go in and change this around to allow the classes to build upon one another, but I have run into some snags as far as understanding what you guys are actually doing in the metadata.py code. If you all could provide some assistance that would be lovely. Ultimately, I think it would make sense for each column to contain all of the data in the "schema_columns" table. And then the TableMetadata and KeyspaceMetadata classes could be built on top of each other. Thoughts?

Adam Holmberg

unread,
May 21, 2015, 12:56:35 PM5/21/15
to python-dr...@lists.datastax.com
I'm wary of changing this structure now, especially with the schema modernization efforts coming on the server side.

The present model evolved in way that makes it possible generate CQL from it. For example, column membership in the primary key is really a property of the table. What is it you're trying to accomplish with respect to partition_key and clustering_key that cannot be derived from those attributes on TableMetadata? These properties contain lists of references to the columns that are part of the key.

Adam

Aaron Benz

unread,
May 21, 2015, 1:56:20 PM5/21/15
to python-dr...@lists.datastax.com
One of the things that I want to be able to do is dynamically print out the structure of a particular table that displays the hierarchy. So basically something like this:

partition_keys
        clustering1
            clustering2
                regular1
                regular2
                regular3
        static1
        static2
        static3

I wanted to also then use that to be able to create a multi indexing solution in pandas so that people without cassandra experience can understand and work with data from cassandra. 

The issue is that I found out that the ColumnMetadata does not actually have the type, instead, I have to do a couple different lookups like getting info from the TableMetadata (clustering and partition) and then look at the ColumnMetadata (is_static). Once I have that, I could deduce if it is a regular column. That process really isn't the best solution. It makes a lot more since for the column to contain everything you need to know about the column, and then the table to be built off of the information in the column (like right now, the TableMetadata class has to be passed the partition_keys and clustering_keys. Really, you should just pass the columns, and then the TableMetadata class could figure out what columns are partition_keys and which ones are clustering_keys and whatever). 

Alternatively, I could just query the system.schema_columns table myself and figure everything out. However, that would only serve my needs as opposed to potentially solving other use cases that might be out there. So, I thought that this was worth addressing and potentially re-doing parts of those classes so that (at least in my mind) the classes build on each other properly. 

Aaron Benz

unread,
Jul 13, 2015, 2:21:28 PM7/13/15
to python-dr...@lists.datastax.com
Hey Guys, 

So I followed up on redoing (differently) the meta data pieces to enable a different level access to the cassandra tables, particularily for viewing it. Checkout my readme to let me know what you think https://github.com/aaronbenz/caspanda . Basically, its the difference between viewing cassandra tables  like so:

print cl.keyspaces["tests"].tables["albums"]

#   make text partition_key
#   state text partition_key
#       day timestamp clustering_key
#           event_time timestamp clustering_key
#               dealership text 
#               year int 
#               salesman text 
#       distributor_lead text static
#       account_lead text static

The traditional method for viewing this in CQL is this:

print cl.metadata.keyspaces["tests"].tables["sold_cars"].export_as_string()

#CREATE TABLE tests.sold_cars (
#    make text,
#    state text,
#    day timestamp,
#    event_time timestamp,
#    account_lead text static,
#    dealership text,
#    distributor_lead text static,
#    salesman text,
#    year int,
#    PRIMARY KEY ((make, state), day, event_time)

Separately, is there a way to intelligently return a set of data in a hierarchical format like a dictionary? Particularly, a way that is similar to how Cassandra actually stores the data? For example, right now if I had the table above and I made a query, every tuple would contain the same static variable. Alternatively, if returned in a hierarchal format, a dict could supply it all in a hierarchy where each static variable is only recorded once as opposed to every tuple.
Reply all
Reply to author
Forward
0 new messages