How to create a composite key with Pycassa

313 views
Skip to first unread message

hanspet...@gmail.com

unread,
Apr 2, 2013, 9:30:14 AM4/2/13
to pycassa...@googlegroups.com
Hi,

The pycassa 1.8.0 documentation shows how to create a column family with a composite column.
I can't figure out how to do the same for a table with a composite key.

Is that possible?
If so, how?

Regards Hans-Peter

Tyler Hobbs

unread,
Apr 3, 2013, 12:39:42 PM4/3/13
to pycassa...@googlegroups.com
To create a column family with a composite row key, you would do something like

system_manager.create_column_family(..., key_validation_class="CompositeType(UTF8Type, Int32Type)")



--
You received this message because you are subscribed to the Google Groups "pycassa-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pycassa-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Tyler Hobbs
DataStax

Paul Westin

unread,
May 1, 2013, 12:18:29 PM5/1/13
to pycassa...@googlegroups.com
How do you query for a slice of this family once it's created?

I've have a ColumnFamily using a key validation of (DateType, LexicalUUIDType), and am trying to query for all rows by a particular date:

startdate = datetime_to_date(datetime.datetime.utcnow()) # datetime_to_date removes the hours/minutes/seconds
slice = (startdate,)
for row in cf.get_range(start, columns=["md5"]):
    print row


This apparently returns everything:

((datetime.datetime(2013, 4, 16, 0, 0), UUID('c35db94c-e4ce-353f-9f52-09986a0a8188')), OrderedDict([(u'md5', u'xxxxxxxxxxx')]))
((datetime.datetime(2013, 4, 14, 0, 0), UUID('d06c5dd9-6109-3495-9424-9a3693b62460')), OrderedDict([(u'md5', u'xxxxxxxxxxx')]))
((datetime.datetime(2013, 4, 17, 0, 0), UUID('30acc2e2-8656-3e08-b557-f0ac1d847659')), OrderedDict([(u'md5', u'xxxxxxxxxxx')]))
((datetime.datetime(2013, 4, 13, 0, 0), UUID('edeeadc0-f999-3600-87ca-27c8941ab59b')), OrderedDict([(u'md5', u'xxxxxxxxxxx')]))
...


What am I missing this time?

Tyler Hobbs

unread,
May 1, 2013, 1:50:25 PM5/1/13
to pycassa...@googlegroups.com
If you're not using an order preserving partitioner (which is generally a bad idea anyway), you cannot get a lexicographical range of rows by their keys.  This is somewhat old, but addresses the topic pretty well: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

Paul Westin

unread,
May 1, 2013, 2:13:45 PM5/1/13
to pycassa...@googlegroups.com
So you're saying it's not possible to query by only one part of a "Composite" row key in the way you can with column keys? My understanding based on this was that composite primary keys are actually stored together, clustered by the "Partition key" (first composite column)

Cassandra uses the first column name in the primary key definition as the partition key, which is the same as the row key to the underlying storage engine.

Is this only applicable when using CQL to interface with the data?

This puts me at a bit of a loss with pycassa, as secondary indexes appeared too slow for production use, now composite column keys are only query-able through Thrift if you know all elements of the key... I guess it's back to the drawing board :)

Tyler Hobbs

unread,
May 1, 2013, 2:40:07 PM5/1/13
to pycassa...@googlegroups.com
On Wed, May 1, 2013 at 1:13 PM, Paul Westin <nopt...@gmail.com> wrote:
So you're saying it's not possible to query by only one part of a "Composite" row key in the way you can with column keys? My understanding based on this was that composite primary keys are actually stored together, clustered by the "Partition key" (first composite column)

Cassandra uses the first column name in the primary key definition as the partition key, which is the same as the row key to the underlying storage engine.

Is this only applicable when using CQL to interface with the data?

Columns in CQL3 do not correspond directly to columns in the normal Thrift API (and hence, pycassa).  The Thrift and CQL3 sections of this article may help explain what's going on: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
 

This puts me at a bit of a loss with pycassa, as secondary indexes appeared too slow for production use, now composite column keys are only query-able through Thrift if you know all elements of the key... I guess it's back to the drawing board :)

Basically, you'll want to move the DateType into the comparator, either by itself or as the first component in a CompositeType.


--
Tyler Hobbs
DataStax
Reply all
Reply to author
Forward
0 new messages