Re: Python3 Protobufs

1,756 views
Skip to first unread message

Charles Law

unread,
Sep 25, 2012, 4:47:09 PM9/25/12
to prot...@googlegroups.com
I thought about this a little, and realized that both unicode and str type strings are passed into fields that have cpp_type CPP_STRING and field_type TYPE_STRING.  I know the 7-bit character limit is only imposed on str type strings - all the extreme value tests use unicode strings.  In Python3, all strings are unicode, so should this limit only exist in Python 2.x?


On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
I've made an attempt to create a Python3 compatible version of protobufs.  I have some code that passes pretty much all the unit tests which I've posted here:


I probably won't have a chance to look at this again for a couple weeks if not longer, so I want to get it out there.  In my attempt I decided to follow the advice in another post, and treat python3 as a new language.  To get python3 working, you'll have to compile the C code.  There are also a few issues I ran into along the way:
  • I decided to use strings where unicode is used in Python 2.  I was originally going to try to use bytes/bytearrays, but they do not support >8 bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. (Warning: I might have something conceptually wrong here)
  • There are places where byte data is stored as strings, then converted to unicode.  I ended up converting strings (I called them bytestr's) to normal strings.  I'm not sure this is done correctly everywhere though.
  • Data is packed/unpacked using struct.pack/unpack which is done using bytes instead of strings in Python3.  I have simple string_to_bytes() and bytes_to_string() functions to do this.

What's left is:
  • There are a couple Exceptions that I don't throw.  They are supposed to be where the Python2 code converts from unicode strings to regular strings.  I am definitely missing something conceptually here - I haven't figured out how Python 2x supports strings with "exotic" characters, but not strings like u'a\x80a'.  If someone can solve this problem & figure out when to throw the exceptions Python3 will be fully working.

I might have small bits of time here or there but I don't think I can devote the time I need to get this finished for several weeks, so if someone wants to finish this up, feel free to fork this code.  If anyone wants to see what I did, the best way to do this is to diff between the latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
Thanks!
Charles

Charles Law

unread,
Oct 1, 2012, 7:13:11 PM10/1/12
to prot...@googlegroups.com
I assumed that the type/value errors are no longer valid in Python 3, so I removed the 3 checks in reflection_test.testStringUTF8Encoding().  All unit tests now pass!

Charles Law

unread,
Dec 6, 2012, 10:39:26 AM12/6/12
to prot...@googlegroups.com
I've had a chance to revisit this and I tested this using the python Riak library.  As a result, I made some changes, and I have a much better Python 3 implementation.

My original python 3 protobufs code took in byte data passed in as strs.  The way the Riak library works, it's reading data from a socket and sending that to protobufs to decode.  I believe most libraries will do the same - read from a socket or maybe a file, which will load 'bytes' in Python 3.  I changed my code so that it works with bytes instead of strings.  I translated portions of code until I was able to read from Riak, then went back and got the unittests to pass as well, and both are working now!

My next goal is to get Python 3 to use the _pb2 suffix just like the Python 2 code does.  I am currently using _py3_pb2 as a suffix because of the C++ tests, but this meant I had to go into the Riak library and change a bunch of imports, for example import riak_pb2 --> import riak_py3_pb2 as riak_pb2.

Charles Law

unread,
Mar 21, 2013, 2:28:55 PM3/21/13
to prot...@googlegroups.com
I have been doing the same thing over the last week.  At PyCon Barry Warsaw, Lennart Regebro, and several others held a porting from 2 to 3 clinic where I got some really great tips.  They answered all the issues I thought would be hard, and I figured I should do the updates while the fixes were fresh in my mind.  So over the last week I have been merging my Python 2 & 3 code, and I finally finished it yesterday.  I cleaned up the C/generator code today, and tested it by reading & writing to Riak.

So some notes about my approach:
    - I went with single code base as well.  When I got to the point of updating the setup to run 2to3 I realized this would be hard (specifically for tests).  The 2 & 3 code was already 99% similar, so I figured single source is better.
    - We run 2.6 and 3.2 at OpenX, and that is what I developed against.  Everything works for 2.6+ and 3.2+.
    - The python 2 API might be slightly different now.  I still need to do more testing here to make sure everything works as I expect.  String fields should only accept unicode (u"") now, and byte fields should only accept bytes/str (b"").  Literals ("") are by default str, but if you import unicode_literals, they become unicode.  I'm not sure how strictly protobufs enforces these type checks, but python2 code that passed in str objects for string fields might need to be fixed to pass in unicode.

The repo is located here.

Also, for those that want to test this I have some simple build instructions.  I don't have much C experience, so I spent a good 20-30 minutes figuring out the build the first time.  I had to install gcc-c++, autoconf and automake.  Then in the base directory I ran:
./autogen.sh
./configure
make checks (optional)
make install

Thanks,
Charles


On Wednesday, March 20, 2013 7:53:18 AM UTC-7, Malthe Borch wrote:
I have completed this work: https://github.com/malthe/google-protobuf.

– in as much as that all tests run without fail on both 2.7 and 3.3. I have used a single-source approach (which is only really feasible starting with those two for syntax compatibility reasons).

Python 2.4, 2.5 and I believe even 2.6 simply aren't going to work. It's too much effort.

\malthe

Dale Peterson

unread,
Apr 22, 2013, 2:54:15 PM4/22/13
to prot...@googlegroups.com
Charles,
  Thanks for taking the time to do this!  Is it possible to make your github repo be based off of the latest svn checkout of GPB?  I have used the instructions here [0] to do this for other projects where I wanted to use git but the official code was managed with svn (as in this case).  The nice thing about this is then hopefully your work could be merged into GPB down the road since it would be based off the most recent commit and the merge conflicts would be eliminated.

It looks like your initial svn checkout of the  GPB repos was a year ago so all the python3 tests you've got are based on version 2.4.1.

I've already set up git svn clone of the official google protobuf svn repo here:

If you were to clone this and see if you could apply all your patches to the latest svn checkout, then hopefully the maintainers of GPB would be more willing to get the ball rolling on Python3.

Let me know if I can help at all, I am using GPB in a project where everything else is Python 3 but I have to also make sure Python 2 is available and it would be nice if GBP supported both.

Luke



Charles Law

unread,
Aug 22, 2013, 5:36:07 PM8/22/13
to prot...@googlegroups.com
I'd be interested in getting this done.  I do want it merged in at some point & this will only make it easier.  Since I did some of the translation at work I'm talking to legal about getting the contributor license agreement signed.  Free time is very rare though.

It might (not for sure, but just might) make sense to take the latest svn branch and merge my code into that.  It took a couple iterations to get it right so there are some commits that don't add anything in the end.  The hardest thing I think was finding all the places strings should be bytes.  I might have the time to do this in the next month or 2.
Thanks,
Charles

Dale Peterson

unread,
Aug 22, 2013, 8:18:33 PM8/22/13
to prot...@googlegroups.com
On Thursday, August 22, 2013 2:36:07 PM UTC-7, Charles Law wrote:
I'd be interested in getting this done.  I do want it merged in at some point & this will only make it easier.  Since I did some of the translation at work I'm talking to legal about getting the contributor license agreement signed.  Free time is very rare though.

It might (not for sure, but just might) make sense to take the latest svn branch and merge my code into that.  It took a couple iterations to get it right so there are some commits that don't add anything in the end.  The hardest thing I think was finding all the places strings should be bytes.  I might have the time to do this in the next month or 2.
Thanks,
Charles 

It would be great to have Python3 support. A couple other projects I am involved with have recently switch to a single code base (to avoid the need to do the 2to3 dance), and they have reported very positively about their experience. If you are interested, here is an excellent summary of how the SymPy project handles things (they dropped support for Python <= 2.5 though, not sure how possible that is for protbuf).

http://ondrejcertik.blogspot.com/2013/08/how-to-support-both-python-2-and-3.html

I believe the numpy, scipy, ipython, and matplotlib folks are also doing this, or are very close to having it done. One of the SymPy maintainers worked for Continuum over the summer and posted about this here as well:

http://asmeurersympy.wordpress.com/2013/08/22/python-3-single-codebase-vs-2to3/
http://asmeurersympy.wordpress.com/2013/08/09/using-python-3-as-my-default-python/

Anyway, if you want me to review any patches, ping me and I'll try to do so.

Luke

Kirill Bogdanov

unread,
Mar 14, 2014, 8:50:13 PM3/14/14
to prot...@googlegroups.com
Hi,

I've made an effort to port Charles Law's work to a new version of Protobuf, along the way making a number of other changes. For instance, __metaclass__ is ignored by Python3 and thus
__metaclass__ = reflection.GeneratedProtocolMessageType
has to become
MyProtoClass= reflection.GeneratedProtocolMessageType(str('MyProtoClass'),(message.Message,),{'DESCRIPTOR': mydescriptor})

Some of the other changes reflect making the C++ part run using MSVC2012 Express SP3 on Windows 7 x86_64. For instance, redirecting standard error requires a call to _get_osfhandle(2) rather than changes to a handle returned by GetStdHandle(STD_ERROR_HANDLE).
I used custom MSVC 2012 Express project files for both gtest and protobuf and tests pass with both Python 3.3 x86_64 and Cygwin's python 2.7.

I've attached an svn diff and utils.py, these cover changes to protobuf but not changes to gtest which I checked out and modified separately.
protobuf-519.patch
utils.py

Kirill Bogdanov

unread,
Mar 15, 2014, 4:41:20 PM3/15/14
to prot...@googlegroups.com
I've ported Charles Law's work to a recent protobuf (version 519). It passes tests on Win7 x86_64 with Python 3.3 and 2.7. Could post a diff against svn checkout if you are interested.

Ilya Kulakov

unread,
Apr 11, 2014, 1:29:02 PM4/11/14
to prot...@googlegroups.com
Hi Kirill,

Please post a diff. I'll apply it and make Pull Request to Charles' repo.

воскресенье, 16 марта 2014 г., 3:41:20 UTC+7 пользователь Kirill Bogdanov написал:

Kirill Bogdanov

unread,
Apr 11, 2014, 9:28:29 PM4/11/14
to prot...@googlegroups.com
The diff is in my first message from Mar 15, perhaps I should simply delete the second one which I posted because the first one not appear in the thread for a few days.

Ilya Kulakov

unread,
Apr 12, 2014, 1:05:56 AM4/12/14
to prot...@googlegroups.com
What's the purpose of all these WIN32 ifdefs? I didn't have problems compiling protobuf for windows.

суббота, 12 апреля 2014 г., 8:28:29 UTC+7 пользователь Kirill Bogdanov написал:

Ilya Kulakov

unread,
Apr 12, 2014, 5:44:44 AM4/12/14
to prot...@googlegroups.com
Here is my attempt to port protobuf 2.5.0 to py3k based on Kirill's pathces: https://github.com/GreatFruitOmsk/python3-protobuf

Kirill Bogdanov

unread,
Apr 12, 2014, 5:20:17 PM4/12/14
to prot...@googlegroups.com
On Saturday, April 12, 2014 6:05:56 AM UTC+1, Ilya Kulakov wrote:
What's the purpose of all these WIN32 ifdefs? I didn't have problems compiling protobuf for windows.

I was using my own project files where I was trying to avoid as many "compatibility"-related defines as possible. For instance, by default Windows uses _close rather than close. Using _CRT_NONSTDC_NO_DEPRECATE (thanks to http://stackoverflow.com/questions/1563940/msvc-open-close-etc ) would have been better than doing what I did. Perhaps an even better way would be to convert most standard calls to win32-secure calls (such as fopen_s) when building for win32.

Ilya Kulakov

unread,
Apr 16, 2014, 6:40:28 AM4/16/14
to prot...@googlegroups.com
Repo was moved to https://github.com/GreatFruitOmsk/protobuf-py3

суббота, 12 апреля 2014 г., 16:44:44 UTC+7 пользователь Ilya Kulakov написал:

Ilya Kulakov

unread,
Apr 17, 2014, 11:44:05 AM4/17/14
to prot...@googlegroups.com
The project now passes tests for Python 2.4-2.7 and 3.1-3.4. Probably for Python 3.0 as well, but it's not available for my Linux distro.

https://github.com/GreatFruitOmsk/protobuf-py3/releases/tag/2.5.1-pre

среда, 16 апреля 2014 г., 17:40:28 UTC+7 пользователь Ilya Kulakov написал:

Jano Kupec

unread,
Aug 18, 2014, 11:08:26 AM8/18/14
to prot...@googlegroups.com
Ilya, any ideas what's needed to make PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp work with Python3?
http://yz.mit.edu/wp/fast-native-c-protocol-buffers-from-python/

Ilya Kulakov

unread,
Aug 19, 2014, 1:57:56 PM8/19/14
to prot...@googlegroups.com
Hi Jano,

I never tried it myself and I always though it's experimental feature and not very well tested. Could you create an issue on github (https://github.com/GreatFruitOmsk/protobuf-py3/issues) ?

I'm planning to down integrate all changes from current repository (they pushed a lot of changes recently).

понедельник, 18 августа 2014 г., 10:08:26 UTC-5 пользователь Jano Kupec написал:
Reply all
Reply to author
Forward
0 new messages