problem with python and unicode string

2,839 views
Skip to first unread message

bart van deenen

unread,
Jan 28, 2009, 2:58:33 AM1/28/09
to Protocol Buffers
Hi all

I'm using the python generated code to create a generic gpb object
editor, and I've run into unicode issues. Here's a small example for
the protofile example.proto

# example.proto
message test {
required string s=1;
}

#!/usr/bin/python
# -*- coding: utf-8 -*-
from PyQt4.QtGui import *
from example_pb2 import *
from google.protobuf import text_format

ustring = u"ﺵﺎﻫﺩﺓ ﺐﺛ"
astring = "hello"
T = test()
T.s=astring # str(ustring)

# T.s=ustring # FAILS

#print text_format.MessageToString(T)

# the stuff below here is to see that i have no other encoding issues.
The QLabel correctly
# shows the same characters as the original string
app = QApplication([])
l = QLabel()
l.setText(ustring)
l.show()
app.exec_()

Using the unicode string (T.s = ustring) fails with
TypeError: u'\u0634\u0627\u0647\u062f\u0629 \u0628\u062b' has type
<type 'unicode'>, but expected one of: (<type 'str'>,)

How do I get my unicode data into my message? I can't coerce my
unicode into <type 'str'> I think.

Bart

P.S. the gpb object editor I'm building is open source and can be got
from
http://github.com/bvdeenen/gpbedit/tree/master

P.P.S I have no idea what the unicode means, it's from an Al Arabya
headline.

bart van deenen

unread,
Jan 28, 2009, 3:02:37 AM1/28/09
to Protocol Buffers
Oops found it already :-)

T.s=ustring.encode('utf-8')

and for getting it out

ustring2 = T.s.decode('utf-8')

Bart

Kenton Varda

unread,
Jan 28, 2009, 1:50:37 PM1/28/09
to bart van deenen, Protocol Buffers
What version of protocol buffers are you using?  I believe we added unicode support to Python protobufs in 2.0.1 or 2.0.2, in which case you shouldn't need to manually encode/decode.

bart van deenen

unread,
Jan 28, 2009, 3:16:42 PM1/28/09
to Protocol Buffers


On Jan 28, 7:50 pm, Kenton Varda <ken...@google.com> wrote:
> What version of protocol buffers are you using?  I believe we added unicode
> support to Python protobufs in 2.0.1 or 2.0.2, in which case you shouldn't
> need to manually encode/decode.

It's 2.0.3

I'm not at work right now, but will check the python version (it's
2.5 or 2.6)
I'm happy to help track this issue, I've compiled protoc earlier when
I needed actionscript support. I think I'm currently using the default
from ubuntu, but let me check tomorrow.

Thanks

Bart

bart van deenen

unread,
Jan 29, 2009, 3:00:32 AM1/29/09
to Protocol Buffers
Hi Kenton

Here's the details:

kubu:~$ protoc --version
libprotoc 2.0.3
kubu:~$ python -V
Python 2.5.2

This is a self-compiled protoc (because I had added the actionscript
output generator). I actually find in CHANGES.txt for 2.0.2 that the
"Strings now use the "unicode" type rather than the "str" type. I'd
really like it if this was resolved, and I'm quite willing to help any
way I can.

Here's the testcase again.

Bart.

kubu:~/testcase$ ./test.py
Traceback (most recent call last):
File "./test.py", line 18, in <module>
T.s=ustring # FAILS
File "build/bdist.linux-i686/egg/google/protobuf/reflection.py",
line 381, in setter
File "build/bdist.linux-i686/egg/google/protobuf/internal/
type_checkers.py", line 59, in CheckValue
TypeError: u'\u0634\u0627\u0647\u062f\u0629 \u0628\u062b' has type
<type 'unicode'>, but expected one of: (<type 'str'>,)


kubu:~/testcase$ cat
test.py
#!/usr/bin/
python
# -*- coding: utf-8 -*-
from example_pb2 import *

# example.proto
# message test {
# required string s=1;
# }

ustring = u"شاهدة بث"
T = test()
T.s=ustring # FAILS

Kenton Varda

unread,
Feb 1, 2009, 9:32:51 PM2/1/09
to bart van deenen, Petar Petrov, Protocol Buffers
Petar, can you look into this?

2009/1/29 bart van deenen <bart.va...@gmail.com>

Petar Petrov

unread,
Feb 9, 2009, 4:40:05 PM2/9/09
to bart van deenen, Protocol Buffers
2009/1/29 bart van deenen <bart.va...@gmail.com>
Hi Kenton

Here's the details:

kubu:~$ protoc --version
libprotoc 2.0.3
kubu:~$ python -V
Python 2.5.2

This is a self-compiled protoc (because I had added the actionscript
output generator). I actually find in CHANGES.txt for 2.0.2 that the
"Strings now use the "unicode" type rather than the "str" type. I'd
really like it if this was resolved, and I'm quite willing to help any
way I can.

Here's the testcase again.

Bart.

kubu:~/testcase$ ./test.py
Traceback (most recent call last):
 File "./test.py", line 18, in <module>
   T.s=ustring #  FAILS
 File "build/bdist.linux-i686/egg/google/protobuf/reflection.py",
line 381, in setter
 File "build/bdist.linux-i686/egg/google/protobuf/internal/
type_checkers.py", line 59, in CheckValue
TypeError: u'\u0634\u0627\u0647\u062f\u0629 \u0628\u062b' has type
<type 'unicode'>, but expected one of: (<type 'str'>,)

You are using a very old version of the Python API (most likely from the 2.0.1 release). Releases after 2.0.1 support unicode.
And there is no such check at line #59 anymore. The old version was checking if the values assigned to 'string' types are of the 'str' Python type.

Your protoc seems to be part of a recent version, but it looks like you haven't updated the Python API.

Reply all
Reply to author
Forward
0 new messages