UnicodeDecodeError of UserType

14 views
Skip to first unread message

Yi Wang

unread,
Sep 13, 2015, 11:18:24 PM9/13/15
to DataStax Python Driver for Apache Cassandra User Mailing List
Hi  Developers,
I got this error when run the code bellow
raise UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1: ordinal not in range(128)
from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
from cassandra.cqlengine.connection import setup
import uuid
from cassandra.cqlengine.usertype import UserType

setup(['127.0.0.1'], 'mykeyspace', protocol_version=3)

class User(UserType):
    age = columns.Integer()
    name = columns.Text()


class Person(Model):
    __keyspace__ = 'mykeyspace'

    id = columns.UUID(primary_key=True)
    first_name = columns.Text()
    last_name = columns.Text(default='')
    user = columns.UserDefinedType(User)

sync_table(Person)

Person.create(id=uuid.uuid4(), first_name=u"名字".encode('utf8'), user=User(age=10, name=u'字'))


but the travis build failure for Python3.3 


Thanks,
Yi

Adam Holmberg

unread,
Sep 14, 2015, 11:26:45 AM9/14/15
to python-dr...@lists.datastax.com
Yi,

Thanks for the PR. This does not quite address the issue. It may silence the error in Python 2 by causing the string interpolation to avoid unicode, but I'm not sure it will produce the string you expect in the database:

In [4]: val = u'字'
In [5]: "%s : %s" % (u'name'.encode('utf8'), enc.cql_encode_all_types(val))
Out[5]: "name : '\xe5\xad\x97'"

You can see this results in the value ascii-encoded as a hex string.

In Python 3, this workaround unfortunately produces a 'b' in the encoded string:

In [13]: "%s : %s" % (u'name'.encode('utf8'), enc.cql_encode_all_types(val))
Out[13]: "b'name' : '字'"

We have a couple of tickets open around CQL string encoding:

They will be addressed in the next release that is not focused on Cassandra parity.

Thanks,
Adam Holmberg

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.



--

Yi Wang

unread,
Sep 15, 2015, 5:01:19 AM9/15/15
to python-dr...@lists.datastax.com
Thanks for the response Adam.

I update the PR, tests have passed,

it's a fix for https://datastax-oss.atlassian.net/browse/PYTHON-353, make sure user define type's field name is encoded.

>
In [4]: val = u'字'
In [5]: "%s : %s" % (u'name'.encode('utf8'), enc.cql_encode_all_types(val))
Out[5]: "name : '\xe5\xad\x97'"


yes, val is correctly encode, but field name is still Unicode, so encode function of user define type will raise error like this:
  File "/usr/local/lib/python2.7/dist-packages/cassandra/query.py", line 800, in bind_params
    return query % dict((k, encoder.cql_encode_all_types(v)) for k, v in six.iteritems(params))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 67: ordinal not in range(128)

Regards,
Yi
Reply all
Reply to author
Forward
0 new messages