Custom packing & unpacking

56 views
Skip to first unread message

stantonk

unread,
Nov 1, 2012, 7:29:45 PM11/1/12
to pycassa...@googlegroups.com
I am trying to do some custom packing/unpacking and am not having much luck.

Basically, I have a CF like this:

create column family CF
  with comparator = 'CompositeType(UTF8Type, LongType, UTF8Type)'
  and key_validation_class=LongType
  and default_validation_class=UTF8Type;

And pycassa happily marshals the data like so:

>>> CF.get(123, column_count=1)
OrderedDict([((u'astring', 123, u'anotherstring'), {'ajson': 'blob'})])

And I am manually converting the column names from a tuple of (u'astring', 123, u'anotherstring') into a delimited str (e.g. "astring,123,anotherstring") for use in a url. When storing new columns, I have to change it back to a tuple, and have to do tuple(s.split(',')).

What I want is to have these conversion steps on the column names be transparent. I fiddled with a couple of different techniques:

class MyType(CassandraType):

    @staticmethod
    def pack(val):
        return val.split(',')

    @staticmethod
    def unpack(val):
        return ','.join(['%s'] * len(val)) % val

This obviously doesn't work because by the time these methods are called, the value passed in is somewhat unpacked:

'^TastringP*i!^Tanotherstring'

looks like there's some padding character in the front of the strings, ^T, and I can unpack the number in the middle using struct.unpack('>I', 'P*i!'), but I have no way of telling where the LongType in the middle of the CompositeType is?

If I try inheriting from CompositeType instead of CassandraType but try the same static methods, I get the same results. It seems like the pack and unpack methods aren't running at the very last step, at least in the case of CompositeTypes?

btw, I am on pycassa 1.6.0, but I tried in 1.7.2 also and got the same results.

Thanks, hopefully there is enough info here...I tried tracing the marshaling code for a bit but I don't know enough about Thrift or the underlying storage protocol for Cassandra so it was difficult to follow :-\

stantonk

unread,
Nov 1, 2012, 7:31:12 PM11/1/12
to pycassa...@googlegroups.com
P.S. In reading over my post, I saw that I specified 123 as the rowkey and as the 2nd component of the CompositeType for the column name. This was not meant to indicate that those are one in the same value.

Tyler Hobbs

unread,
Nov 5, 2012, 2:20:09 PM11/5/12
to pycassa...@googlegroups.com
Unfortunately, the way CassandraType functions are called is a little weird, especially for composites.  (That's something I'd like to fix in the future.)

You're correct that the binary format has some other data involved.  It's not just a separator, it's a length field and end-of-component byte.  You'll need to use the normal CompositeType packing and unpacking in addition to what you're doing.  I haven't tested this at all, but this is one way I might do it:

class MyType(CassandraType):

    def __init__(self, *components):
        self.components = components
        composite_pack = marshal.get_composite_packer(composite_type=self)
        composite_unpack = marshal.get_composite_unpacker(composite_type=self)

        def pack(csv_val, *args, **kwargs):
            tuple_val = tuple(csv_val.split(','))
            binary_val = composite_pack(tuple_val)
            return binary_val

        def unpack(binary_val):
            tuple_val = composite_unpack(binary_val)
            csv_val = ','.join(tuple_val)
            return csv_val

        self.pack = pack
        self.unpack = unpack


I hope that helps!  Let me know if it totally doesn't work :)
--
Tyler Hobbs
DataStax

stantonk

unread,
Dec 17, 2012, 3:10:37 PM12/17/12
to pycassa...@googlegroups.com
Thanks for this, sorry I didnt get back to you sooner!
Message has been deleted

Kevin Stanton

unread,
Mar 26, 2014, 1:04:13 PM3/26/14
to pycassa...@googlegroups.com
Had an issue with this but solved it. Need to make sure the slice_start is passed along!

def pack(csv_val, *args, **kwargs):
   # sure to pass along the slice_start, or else slice-based
   # queries won't perform correctly
   if args:
      slice_start = args[0]
   else:
      slice_start = kwargs.get('slice_start')

   tuple_val = tuple(csv_val.split(','))
   binary_val = composite_pack(tuple_val, slice_start)
   return binary_val


On Monday, November 5, 2012 1:20:09 PM UTC-6, Tyler Hobbs wrote:
Reply all
Reply to author
Forward
0 new messages