Composite type keys - query by first component(s)

93 views
Skip to first unread message

André Cruz

unread,
Sep 19, 2012, 12:25:20 PM9/19/12
to pycassa...@googlegroups.com
Hello.

I have been reading http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 and am trying to convert my Super Columns to regular Columns with composite type keys, but have come across a problem.

Assuming I have a CF like this:

CF Test:
 Key -> 'CompositeType(IntegerType, UTF8Type)'

Data:

(1, "TEST1") -> "ASDF": 1
(1, "TEST2") -> "ASDF": 1
(2, "TEST1") -> "ASDF": 1
(3, "TEST1") -> "ASDF": 1
(4, "TEST1") -> "ASDF": 1
(4, "TEST2") -> "ASDF": 1 


I would like to be able to fetch all rows that have "1" as the first component of the key. 

I've tried:

get(1)
TypeError: object of type 'int' has no len()

get((1,))
UnboundLocalError: local variable 'eoc' referenced before assignment

get_range() with the same parameters also raises the same errors.


Can anyone give me an hint?

Thanks,
André

André Cruz

unread,
Sep 20, 2012, 6:26:19 AM9/20/12
to pycassa...@googlegroups.com
On Sep 19, 2012, at 5:25 PM, André Cruz <andre...@co.sapo.pt> wrote:

>
> CF Test:
> Key -> 'CompositeType(IntegerType, UTF8Type)'
>
> Data:
>
> (1, "TEST1") -> "ASDF": 1
> (1, "TEST2") -> "ASDF": 1
> (2, "TEST1") -> "ASDF": 1
> (3, "TEST1") -> "ASDF": 1
> (4, "TEST1") -> "ASDF": 1
> (4, "TEST2") -> "ASDF": 1

So it seems the idea is to instead have:

1 -> ("TEST1","ASDF"): 1
1 -> ("TEST2","ASDF"): 1
2 -> ("TEST1","ASDF"): 1
3 -> ("TEST1","ASDF"): 1
4 -> ("TEST1","ASDF"): 1
4 -> ("TEST2","ASDF"): 1

Composite type columns instead of keys. This way it's easy to search for 1, it's the key, and if we want (1, "TEST1") it translates to .get(1, column_start=("TEST1",) column_finish=("TEST1",)).

However, I still find that Super CF offer more features and if we are to convert SCF to CF with Composite Types we will loose the ability to perform certain queries or we will have to create multiple CFs to compensate. Let me give a more concrete example. I have this SCF:

CF FileRevision:
KEY FileID (UUID):
SC Revision (UUID)
C "attribute1": value
C "attribute2": value
SC Revision (UUID)
C "attribute1": value
C "attribute2": value
C "attribute3": value

In order not to use Super Columns the schema would look like:

CF FileRevision:
KEY FileID (UUID):
C Revision1:"attribute1": value
C Revision1:"attribute2": value
C Revision2:"attribute1": value
C Revision2:"attribute2": value
C Revision2:"attribute3": value

With this schema I would have some difficulty obtaining the last X revisions of a file, since column_count would count attributes, not revisions. This same query with the SCF would be easy, I could count the SC and I would even get back a nice dictionary with revisions as keys and sub-dictionaries for the attributes. In order to correctly answer this query without using SCF, I could add another CF and thus end up with:

CF FileRevision:
KEY FileID (UUID):
C Revision1: ''
C Revision2: ''

CF RevisionData:
KEY FileID:Revision1:
C "attribute1": value
C "attribute2": value
KEY FileID:Revision2:
C "attribute1": value
C "attribute2": value
C "attribute3": value

This way I can retrieve the last X revisions of a file from the FileRevision CF, but for the revision details I would need to fetch them from another CF. I don't really see the advantage of not using SCF.

Best regards,
André

Tyler Hobbs

unread,
Sep 20, 2012, 12:59:13 PM9/20/12
to pycassa...@googlegroups.com
This is one case where super columns can do something that's not supported with composite columns and the normal Thrift API.  I *think* cql3 supports doing this with composite columns -- placing a limit on the number of distinct *first* components that you fetch -- but I'm not 100% sure about that.  As you probably already know, pycassa doesn't support cql yet, but it's something that's being worked on.

If you only needed to support fetching the latest revision of a document instead of the latest X revisions, you could use a constant placeholder UUID that represents "latest" that you overwrite for each revision.  For example, you could use this uuid:

In [2]: pycassa.util.convert_time_to_uuid(0)
Out[2]: UUID('13814000-1dd2-11b2-8080-808080808080')
--
Tyler Hobbs
DataStax

Reply all
Reply to author
Forward
0 new messages