Can't read rows from column family: ValueError: bytes is not a 16-char string

Rob Marshall

unread,

Feb 4, 2014, 2:07:56 PM2/4/14

to pycassa...@googlegroups.com

Hi,

I'm getting the following exception when I try to read from a ColumnFamily with what, at least to me (I'm completely new to this), appears to be a valid key. In the following the names have been changed to protect the guilty/innocent... :-)

I have a column family defined as:

create column family ColumnFamily

with column_type = 'Standard'

and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.TimeUUIDType)'

and default_validation_class = 'BytesType'

and key_validation_class = 'UTF8Type'

and read_repair_chance = 0.1

and dclocal_read_repair_chance = 0.0

and populate_io_cache_on_flush = false

and gc_grace = 864000

and min_compaction_threshold = 4

and max_compaction_threshold = 32

and replicate_on_write = true

and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'

and caching = 'KEYS_ONLY'

and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};

In Idle/Python I do:

>>> from pycassa.pool import ConnectionPool

>>> from pycassa.columnfamily import ColumnFamily

>>> cf = ColumnFamily(ConnectionPool('KeySpace'),'ColumnFamily')

>>> dict(cf.get_range(column_count=0,filter_empty=False)).keys()

[u'urn:keyspace:ColumnFamily:UUID:']

>>> cf.get('urn:keyspace:ColumnFamily:UUID:')

Traceback (most recent call last):

File "<pyshell#151>", line 1, in <module>

cf.get('urn:keyspace:ColumnFamily:UUID:')

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 664, in get

return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict

ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 444, in _unpack_name

return self._name_unpacker(b)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite

components.append(unpacker(bytestr[2:2 + length]))

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>

return lambda v: uuid.UUID(bytes=v)

File "/usr/lib/python2.7/uuid.py", line 144, in __init__

raise ValueError('bytes is not a 16-char string')

ValueError: bytes is not a 16-char string

I'm really not sure what I'm doing wrong here. Any suggestions?

Thanks,

Rob

Rob Marshall

unread,

Feb 4, 2014, 3:19:12 PM2/4/14

to pycassa...@googlegroups.com

When using cassandra-cli I can see the data as:

% cassandra-cli -h 10.249.238.131

Connected to: "LocalDB" on 10.249.238.131/9160

Welcome to Cassandra CLI version 1.2.10-SNAPSHOT

Type 'help;' or '?' for help.

Type 'quit;' or 'exit;' to quit.

[default@unknown] use Keyspace;

[default@Keyspace] list ColumnFamily;

Using default limit of 100

Using default cell limit of 100

-------------------

RowKey: urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:

=> (name=autoZoning:::, value=01, timestamp=1391298393966000)

=> (name=creationTime:::, value=00000143efd8b76e, timestamp=1391298393966000)

=> (name=inactive:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=00, timestamp=1391298393966000)

=> (name=label:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=726a6d2d766e782d76613031, timestamp=1391298393966000)

1 Row Returned.

Elapsed time: 16 msec(s).

Since it was unclear what was causing the exception, I decided to add a print prior to the 'return self._name_unpacker(b)' line in columnfamily.py and I see:

>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])

Attempting to unpack: <00>\rautoZoning<00><00><00><00><00><00><00><00><00><00>

Traceback (most recent call last):

File "<pyshell#172>", line 1, in <module>

cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 665, in get

return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict

ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 445, in _unpack_name

return self._name_unpacker(b)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite

components.append(unpacker(bytestr[2:2 + length]))

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>

return lambda v: uuid.UUID(bytes=v)

File "/usr/lib/python2.7/uuid.py", line 144, in __init__

raise ValueError('bytes is not a 16-char string')

ValueError: bytes is not a 16-char string

I have no idea where the extra characters are coming from around the column name. But that got me curious so I added another print in _cosc_to_dict in columnfamily.py and I see:

>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])

list_col_or_super is: []

list_col_or_super is: [ColumnOrSuperColumn(column=Column(timestamp=1391298393966000, name='\x00\rautoZoning\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', value='\x01', ttl=None), counter_super_column=None, super_column=None, counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000, name='\x00\x0ccreationTime\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', value='\x00\x00\x01C\xef\xd8\xb7n', ttl=None), counter_super_column=None, super_column=None, counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000, name='\x00\x08inactive\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='\x00', ttl=None), counter_super_column=None, super_column=None, counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000, name='\x00\x05label\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='thisIsATest', ttl=None), counter_super_column=None, super_column=None, counter_column=None)]

autoZoning unpack:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 666, in get

return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)

File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 369, in _cosc_to_dict

ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)

File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 446, in _unpack_name

return self._name_unpacker(b)

File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 140, in unpack_composite

components.append(unpacker(bytestr[2:2 + length]))

File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 374, in <lambda>

return lambda v: uuid.UUID(bytes=v)

File "/usr/lib64/python2.6/uuid.py", line 144, in __init__

raise ValueError('bytes is not a 16-char string')

ValueError: bytes is not a 16-char string

Am I correct in assuming that the extra characters around the column names are what is responsible for the 'ValueError: bytes is not a 16-char string' exception?

Also if I try to use the column name and select it I get:

>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])

Traceback (most recent call last):

File "<pyshell#184>", line 1, in <module>

cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 651, in get

cp = self._column_path(super_column, column)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 383, in _column_path

self._pack_name(column, False))

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 426, in _pack_name

return self._name_packer(value, slice_start)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 115, in pack_composite

packed = packer(item)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 298, in pack_uuid

randomize=True)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/util.py", line 75, in convert_time_to_uuid

'neither a UUID, a datetime, or a number')

ValueError: Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number

Tyler Hobbs

unread,

Feb 4, 2014, 4:55:24 PM2/4/14

to pycassa...@googlegroups.com

It seems the actual problem column name is '\x00\x0ccreationTime\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'. That's basically using an "empty string" for the UUID component instead of just omitting it. I would expect that to fail validation, but it seems like Cassandra doesn't mind it.

A proper fix will take time (I think I need to mark which types can have empty strings and which should be interpreted as None instead). As a workaround, in marshal.py, line 373 (where it unpacks UUIDTypes), you can change it from:

return lambda v: uuid.UUID(bytes=v)

to:

return lambda v: uuid.UUID(bytes=v) if v else None

By the way, this won't work:

>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])

because the colon-separated-component format is specific to cassandra-cli. In python, use tuples instead:

>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=[('autoZoning\x00\x00\x00', '', '', '')])

--
You received this message because you are subscribed to the Google Groups "pycassa-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pycassa-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Tyler Hobbs
DataStax

Rob Marshall

unread,

Feb 4, 2014, 8:09:08 PM2/4/14

to pycassa...@googlegroups.com

Hi Tyler,

That was helpful, but I'm still seeing:

>>> cf.get(u'urn:storageos:VirtualArray:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=[tuple('label:::14fe78e0-8b9b-11e3-b171-005056b700bb'.split(':'))])

Added print in util.convert_time_to_uuid(): time_arg=14fe78e0-8b9b-11e3-b171-005056b700bb

Traceback (most recent call last):

File "<pyshell#216>", line 1, in <module>

cf.get(u'urn:storageos:VirtualArray:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=[tuple('label:::14fe78e0-8b9b-11e3-b171-005056b700bb'.split(':'))])

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 650, in get

cp = self._column_path(super_column, column)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 383, in _column_path

self._pack_name(column, False))

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 426, in _pack_name

return self._name_packer(value, slice_start)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 115, in pack_composite

packed = packer(item)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 299, in pack_uuid

randomize=True)

File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/util.py", line 76, in convert_time_to_uuid

'neither a UUID, a datetime, or a number')

ValueError: Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number

It appears to be trying to decode the time UUID that is part of the column name, which I can decode fine:

>>> import time_uuid

>>> time_uuid.TimeUUID('14fe78e0-8b9b-11e3-b171-005056b700bb').get_datetime()

datetime.datetime(2014, 2, 1, 23, 46, 33, 966000)

So I'm not exactly sure, I'll have to take a look at it later, how we got there...

Thanks for the help,

Rob

Tyler Hobbs

unread,

Feb 4, 2014, 8:40:17 PM2/4/14

to pycassa...@googlegroups.com

On Tue, Feb 4, 2014 at 7:09 PM, Rob Marshall <rob.mar...@gmail.com> wrote:

>>> cf.get(u'urn:storageos:VirtualArray:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=[tuple('label:::14fe78e0-8b9b-11e3-b171-005056b700bb'.split(':'))])

For the UUID component, you'll need to create a UUID object, not just pass in a string. It's as simple as doing something like:

from uuid import UUID

myuuid = UUID('14fe78e0-8b9b-11e3-b171-005056b700bb')

--
Tyler Hobbs
DataStax

Rob Marshall

unread,

Feb 5, 2014, 12:38:47 AM2/5/14

to pycassa...@googlegroups.com

Or, alternatively, modify util.convert_time_to_uuid() to add an elif prior to the else and raising an exception that would do (requires an: import re):

74 elif re.match('[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}',time_arg.lower()):

75 return uuid.UUID(time_arg)

76 else:

77 raise ValueError('Argument for a v1 UUID column name or value was ' +

78 'neither a UUID, a datetime, or a number')

After all, if it's a string that should be a UUID, and looks like a UUID, i.e. if it quacks like a duck, walks like a duck... :-)

Rob

Reply all

Reply to author

Forward