UTF8Type packing/unpacking

Eric Evans

unread,

Aug 17, 2010, 8:42:50 PM8/17/10

to pycass...@googlegroups.com

Tyler Hobbs recently checked in (among other things) some code that
causes column names and values to be packed/unpacked according to
their compare_with and validate_with respectively. This is something
I'd been thinking of working on for some time, but was too lazy, so
thanks Tyler. :)

One of the things I encountered with this though is that it errors
packing a Python unicode when compare_with/validate_with is UTF8Type,
which seems like a reasonable thing to want to do. This could be
fixed by casting to a string, but I wonder if it doesn't make sense to
special case UTF8Type. Maybe encode('utf8') on the way in and
decode('utf8') on the way out? Basically, require unicode types on
write, and return them on read.

My utf8-fu is weak though, so maybe this is a bad idea.

Thoughts?

--
Eric Evans
john.er...@gmail.com

Tyler Hobbs

unread,

Aug 17, 2010, 8:50:29 PM8/17/10

to pycass...@googlegroups.com

Can you provide an example of what's failing? My UTF8 skills are lacking as well.
I think the unit tests cover what you describe. For example, line 97 of
tests/test_autopacking.py uses:

u'\u0020'.encode('utf8')

but perhaps that's not a good character to test? My tests might also
not be hitting the comparisons that you're seeing.

- Tyler Hobbs

eevans

unread,

Aug 17, 2010, 8:54:23 PM8/17/10

to pycassa-devel

On Aug 17, 7:50 pm, Tyler Hobbs <ty...@riptano.com> wrote:
> Can you provide an example of what's failing? My UTF8 skills are lacking as
> well.
> I think the unit tests cover what you describe. For example, line 97 of
> tests/test_autopacking.py uses:
>
> u'\u0020'.encode('utf8')

That succeeds because encode() returns a string, try it without the
call the encode.

The flip-side of that is, if it started out as a unicode type, if the
compare_with/validate_with is UTF8Type, do you want to return unicodes
as well?

Tyler Hobbs

unread,

Aug 17, 2010, 9:26:28 PM8/17/10

to pycass...@googlegroups.com

Eric, you're totally right on this one. I just checked out "Unicode in Python,
Completely Demystified" and it makes more sense.

I think a call to encode('utf-8') just prior to packing makes sense.

Is there any harm in decoding back to unicode during unpack?

Eric Evans

unread,

Aug 17, 2010, 9:35:17 PM8/17/10

to pycass...@googlegroups.com

On Tue, Aug 17, 2010 at 8:26 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> Eric, you're totally right on this one. I just checked out "Unicode in
> Python, Completely Demystified" and it makes more sense.
>
> I think a call to encode('utf-8') just prior to packing makes sense.
>
> Is there any harm in decoding back to unicode during unpack?

I think that's the sanest way. If compare_with/validate_with are
UTF8Type, then what you insert is treated as unicode, and unicode is
what is returned on read.

--
Eric Evans
john.er...@gmail.com

Tyler Hobbs

unread,

Aug 17, 2010, 10:17:36 PM8/17/10

to pycass...@googlegroups.com

Sounds good. I'll have the changes in tonight or tomorrow morning.

Daniel Lundin

unread,

Aug 18, 2010, 4:58:16 AM8/18/10

to pycass...@googlegroups.com

On Wed, Aug 18, 2010 at 2:42 AM, Eric Evans <john.er...@gmail.com> wrote:
> Tyler Hobbs recently checked in (among other things) some code that
> causes column names and values to be packed/unpacked according to
> their compare_with and validate_with respectively.

Cool, this is pretty nice!

I suppose we need to slice by equally sensible types as well then...

cf.get('mykey', column_start=datetime.now())

This certainly has an appeal over "manually" synthesizing uuid1s for slicing.

/d

Tyler Hobbs

unread,

Aug 18, 2010, 2:45:09 PM8/18/10

to pycass...@googlegroups.com

I just committed the UTF8 changes.

Daniel, are you suggesting converting datetime slice arguments to uuid1s if
that's column type?

(New thread?)

- Tyler

Reply all

Reply to author

Forward