I've made an attempt to create a Python3 compatible version of protobufs. I have some code that passes pretty much all the unit tests which I've posted here:
I probably won't have a chance to look at this again for a couple weeks if not longer, so I want to get it out there. In my attempt I decided to follow the advice in another post, and treat python3 as a new language. To get python3 working, you'll have to compile the C code. There are also a few issues I ran into along the way:
- I decided to use strings where unicode is used in Python 2. I was originally going to try to use bytes/bytearrays, but they do not support >8 bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. (Warning: I might have something conceptually wrong here) - There are places where byte data is stored as strings, then converted to unicode. I ended up converting strings (I called them bytestr's) to normal strings. I'm not sure this is done correctly everywhere though. - Data is packed/unpacked using struct.pack/unpack which is done using bytes instead of strings in Python3. I have simple string_to_bytes() and bytes_to_string() functions to do this.
What's left is:
- There are a couple Exceptions that I don't throw. They are supposed to be where the Python2 code converts from unicode strings to regular strings. I am definitely missing something conceptually here - I haven't figured out how Python 2x supports strings with "exotic" characters, but not strings like u'a\x80a'. If someone can solve this problem & figure out when to throw the exceptions Python3 will be *fully* working.
I might have small bits of time here or there but I don't think I can devote the time I need to get this finished for several weeks, so if someone wants to finish this up, feel free to fork this code. If anyone wants to see what I did, the best way to do this is to diff between the latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210. Thanks! Charles
I thought about this a little, and realized that both unicode and str type strings are passed into fields that have cpp_type CPP_STRING and field_type TYPE_STRING. I know the 7-bit character limit is only imposed on str type strings - all the extreme value tests use unicode strings. In Python3, all strings are unicode, so should this limit only exist in Python 2.x?
On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
> I've made an attempt to create a Python3 compatible version of protobufs. > I have some code that passes pretty much all the unit tests which I've > posted here:
> I probably won't have a chance to look at this again for a couple weeks if > not longer, so I want to get it out there. In my attempt I decided to > follow the advice in another post, and treat python3 as a new language. To > get python3 working, you'll have to compile the C code. There are also a > few issues I ran into along the way:
> - I decided to use strings where unicode is used in Python 2. I was > originally going to try to use bytes/bytearrays, but they do not support >8 > bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. > (Warning: I might have something conceptually wrong here) > - There are places where byte data is stored as strings, then > converted to unicode. I ended up converting strings (I called them > bytestr's) to normal strings. I'm not sure this is done correctly > everywhere though. > - Data is packed/unpacked using struct.pack/unpack which is done using > bytes instead of strings in Python3. I have simple string_to_bytes() and > bytes_to_string() functions to do this.
> What's left is:
> - There are a couple Exceptions that I don't throw. They are supposed > to be where the Python2 code converts from unicode strings to regular > strings. I am definitely missing something conceptually here - I haven't > figured out how Python 2x supports strings with "exotic" characters, but > not strings like u'a\x80a'. If someone can solve this problem & figure out > when to throw the exceptions Python3 will be *fully* working.
> I might have small bits of time here or there but I don't think I can > devote the time I need to get this finished for several weeks, so if > someone wants to finish this up, feel free to fork this code. If anyone > wants to see what I did, the best way to do this is to diff between the > latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210. > Thanks! > Charles
I assumed that the type/value errors are no longer valid in Python 3, so I removed the 3 checks in reflection_test.testStringUTF8Encoding(). All unit tests now pass!
On Tuesday, September 25, 2012 1:47:09 PM UTC-7, Charles Law wrote:
> I thought about this a little, and realized that both unicode and str type > strings are passed into fields that have cpp_type CPP_STRING and field_type > TYPE_STRING. I know the 7-bit character limit is only imposed on str type > strings - all the extreme value tests use unicode strings. In Python3, all > strings are unicode, so should this limit only exist in Python 2.x?
> On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
>> I've made an attempt to create a Python3 compatible version of protobufs. >> I have some code that passes pretty much all the unit tests which I've >> posted here:
>> I probably won't have a chance to look at this again for a couple weeks >> if not longer, so I want to get it out there. In my attempt I decided to >> follow the advice in another post, and treat python3 as a new language. To >> get python3 working, you'll have to compile the C code. There are also a >> few issues I ran into along the way:
>> - I decided to use strings where unicode is used in Python 2. I was >> originally going to try to use bytes/bytearrays, but they do not support >8 >> bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. >> (Warning: I might have something conceptually wrong here) >> - There are places where byte data is stored as strings, then >> converted to unicode. I ended up converting strings (I called them >> bytestr's) to normal strings. I'm not sure this is done correctly >> everywhere though. >> - Data is packed/unpacked using struct.pack/unpack which is done >> using bytes instead of strings in Python3. I have simple string_to_bytes() >> and bytes_to_string() functions to do this.
>> What's left is:
>> - There are a couple Exceptions that I don't throw. They are >> supposed to be where the Python2 code converts from unicode strings to >> regular strings. I am definitely missing something conceptually here - I >> haven't figured out how Python 2x supports strings with "exotic" >> characters, but not strings like u'a\x80a'. If someone can solve this >> problem & figure out when to throw the exceptions Python3 will be * >> fully* working.
>> I might have small bits of time here or there but I don't think I can >> devote the time I need to get this finished for several weeks, so if >> someone wants to finish this up, feel free to fork this code. If anyone >> wants to see what I did, the best way to do this is to diff between the >> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210. >> Thanks! >> Charles
I've had a chance to revisit this and I tested this using the python Riak library. As a result, I made some changes, and I have a much better Python 3 implementation.
My original python 3 protobufs code took in byte data passed in as strs. The way the Riak library works, it's reading data from a socket and sending that to protobufs to decode. I believe most libraries will do the same - read from a socket or maybe a file, which will load 'bytes' in Python 3. I changed my code so that it works with bytes instead of strings. I translated portions of code until I was able to read from Riak, then went back and got the unittests to pass as well, and both are working now!
My next goal is to get Python 3 to use the _pb2 suffix just like the Python 2 code does. I am currently using _py3_pb2 as a suffix because of the C++ tests, but this meant I had to go into the Riak library and change a bunch of imports, for example import riak_pb2 --> import riak_py3_pb2 as riak_pb2.
On Monday, October 1, 2012 4:13:11 PM UTC-7, Charles Law wrote:
> I assumed that the type/value errors are no longer valid in Python 3, so I > removed the 3 checks in reflection_test.testStringUTF8Encoding(). All unit > tests now pass!
> On Tuesday, September 25, 2012 1:47:09 PM UTC-7, Charles Law wrote:
>> I thought about this a little, and realized that both unicode and str >> type strings are passed into fields that have cpp_type CPP_STRING and >> field_type TYPE_STRING. I know the 7-bit character limit is only imposed >> on str type strings - all the extreme value tests use unicode strings. In >> Python3, all strings are unicode, so should this limit only exist in Python >> 2.x?
>> On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
>>> I've made an attempt to create a Python3 compatible version of >>> protobufs. I have some code that passes pretty much all the unit tests >>> which I've posted here:
>>> I probably won't have a chance to look at this again for a couple weeks >>> if not longer, so I want to get it out there. In my attempt I decided to >>> follow the advice in another post, and treat python3 as a new language. To >>> get python3 working, you'll have to compile the C code. There are also a >>> few issues I ran into along the way:
>>> - I decided to use strings where unicode is used in Python 2. I was >>> originally going to try to use bytes/bytearrays, but they do not support >8 >>> bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. >>> (Warning: I might have something conceptually wrong here) >>> - There are places where byte data is stored as strings, then >>> converted to unicode. I ended up converting strings (I called them >>> bytestr's) to normal strings. I'm not sure this is done correctly >>> everywhere though. >>> - Data is packed/unpacked using struct.pack/unpack which is done >>> using bytes instead of strings in Python3. I have simple string_to_bytes() >>> and bytes_to_string() functions to do this.
>>> What's left is:
>>> - There are a couple Exceptions that I don't throw. They are >>> supposed to be where the Python2 code converts from unicode strings to >>> regular strings. I am definitely missing something conceptually here - I >>> haven't figured out how Python 2x supports strings with "exotic" >>> characters, but not strings like u'a\x80a'. If someone can solve this >>> problem & figure out when to throw the exceptions Python3 will be * >>> fully* working.
>>> I might have small bits of time here or there but I don't think I can >>> devote the time I need to get this finished for several weeks, so if >>> someone wants to finish this up, feel free to fork this code. If anyone >>> wants to see what I did, the best way to do this is to diff between the >>> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210. >>> Thanks! >>> Charles
– in as much as that all tests run without fail on both 2.7 and 3.3. I have used a single-source approach (which is only really feasible starting with those two for syntax compatibility reasons).
Python 2.4, 2.5 and I believe even 2.6 simply aren't going to work. It's too much effort.
On Saturday, September 22, 2012 3:16:41 AM UTC+2, Charles Law wrote:
> I've made an attempt to create a Python3 compatible version of protobufs. > I have some code that passes pretty much all the unit tests which I've > posted here:
> I probably won't have a chance to look at this again for a couple weeks if > not longer, so I want to get it out there. In my attempt I decided to > follow the advice in another post, and treat python3 as a new language. To > get python3 working, you'll have to compile the C code. There are also a > few issues I ran into along the way:
> - I decided to use strings where unicode is used in Python 2. I was > originally going to try to use bytes/bytearrays, but they do not support >8 > bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. > (Warning: I might have something conceptually wrong here)
> - There are places where byte data is stored as strings, then > converted to unicode. I ended up converting strings (I called them > bytestr's) to normal strings. I'm not sure this is done correctly > everywhere though.
> - Data is packed/unpacked using struct.pack/unpack which is done using > bytes instead of strings in Python3. I have simple string_to_bytes() and > bytes_to_string() functions to do this.
> What's left is:
> - There are a couple Exceptions that I don't throw. They are supposed > to be where the Python2 code converts from unicode strings to regular > strings. I am definitely missing something conceptually here - I haven't > figured out how Python 2x supports strings with "exotic" characters, but > not strings like u'a\x80a'. If someone can solve this problem & figure out > when to throw the exceptions Python3 will be *fully* working.
> I might have small bits of time here or there but I don't think I can > devote the time I need to get this finished for several weeks, so if > someone wants to finish this up, feel free to fork this code. If anyone > wants to see what I did, the best way to do this is to diff between the > latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
> Thanks!
> Charles
I have been doing the same thing over the last week. At PyCon Barry Warsaw, Lennart Regebro, and several others held a porting from 2 to 3 clinic where I got some really great tips. They answered all the issues I thought would be hard, and I figured I should do the updates while the fixes were fresh in my mind. So over the last week I have been merging my Python 2 & 3 code, and I finally finished it yesterday. I cleaned up the C/generator code today, and tested it by reading & writing to Riak.
So some notes about my approach:
- I went with single code base as well. When I got to the point of updating the setup to run 2to3 I realized this would be hard (specifically for tests). The 2 & 3 code was already 99% similar, so I figured single source is better.
- We run 2.6 and 3.2 at OpenX, and that is what I developed against. Everything works for 2.6+ and 3.2+.
- The python 2 API might be slightly different now. I still need to do more testing here to make sure everything works as I expect. String fields should only accept unicode (u"") now, and byte fields should only accept bytes/str (b""). Literals ("") are by default str, but if you import unicode_literals, they become unicode. I'm not sure how strictly protobufs enforces these type checks, but python2 code that passed in str objects for string fields might need to be fixed to pass in unicode.
Also, for those that want to test this I have some simple build instructions. I don't have much C experience, so I spent a good 20-30 minutes figuring out the build the first time. I had to install gcc-c++, autoconf and automake. Then in the base directory I ran:
./autogen.sh
./configure
make checks (optional)
make install
> – in as much as that all tests run without fail on both 2.7 and 3.3. I > have used a single-source approach (which is only really feasible starting > with those two for syntax compatibility reasons).
> Python 2.4, 2.5 and I believe even 2.6 simply aren't going to work. It's > too much effort.
> \malthe
> On Saturday, September 22, 2012 3:16:41 AM UTC+2, Charles Law wrote:
>> I've made an attempt to create a Python3 compatible version of protobufs. >> I have some code that passes pretty much all the unit tests which I've >> posted here:
>> I probably won't have a chance to look at this again for a couple weeks >> if not longer, so I want to get it out there. In my attempt I decided to >> follow the advice in another post, and treat python3 as a new language. To >> get python3 working, you'll have to compile the C code. There are also a >> few issues I ran into along the way:
>> - I decided to use strings where unicode is used in Python 2. I was >> originally going to try to use bytes/bytearrays, but they do not support >8 >> bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. >> (Warning: I might have something conceptually wrong here)
>> - There are places where byte data is stored as strings, then >> converted to unicode. I ended up converting strings (I called them >> bytestr's) to normal strings. I'm not sure this is done correctly >> everywhere though.
>> - Data is packed/unpacked using struct.pack/unpack which is done >> using bytes instead of strings in Python3. I have simple string_to_bytes() >> and bytes_to_string() functions to do this.
>> What's left is:
>> - There are a couple Exceptions that I don't throw. They are >> supposed to be where the Python2 code converts from unicode strings to >> regular strings. I am definitely missing something conceptually here - I >> haven't figured out how Python 2x supports strings with "exotic" >> characters, but not strings like u'a\x80a'. If someone can solve this >> problem & figure out when to throw the exceptions Python3 will be *
>> fully* working.
>> I might have small bits of time here or there but I don't think I can >> devote the time I need to get this finished for several weeks, so if >> someone wants to finish this up, feel free to fork this code. If anyone >> wants to see what I did, the best way to do this is to diff between the >> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
>> Thanks!
>> Charles
Charles, Thanks for taking the time to do this! Is it possible to make your github repo be based off of the latest svn checkout of GPB? I have used the instructions here [0] to do this for other projects where I wanted to use git but the official code was managed with svn (as in this case). The nice thing about this is then hopefully your work could be merged into GPB down the road since it would be based off the most recent commit and the merge conflicts would be eliminated.
It looks like your initial svn checkout of the GPB repos was a year ago so all the python3 tests you've got are based on version 2.4.1.
If you were to clone this and see if you could apply all your patches to the latest svn checkout, then hopefully the maintainers of GPB would be more willing to get the ball rolling on Python3.
Let me know if I can help at all, I am using GPB in a project where everything else is Python 3 but I have to also make sure Python 2 is available and it would be nice if GBP supported both.