Message from discussion
Python3 Protobufs
Received: by 10.66.86.102 with SMTP id o6mr1172890paz.41.1348606033223;
Tue, 25 Sep 2012 13:47:13 -0700 (PDT)
X-BeenThere: protobuf@googlegroups.com
Received: by 10.68.197.72 with SMTP id is8ls2340840pbc.5.gmail; Tue, 25 Sep
2012 13:47:10 -0700 (PDT)
Received: by 10.68.219.198 with SMTP id pq6mr4129013pbc.0.1348606030225;
Tue, 25 Sep 2012 13:47:10 -0700 (PDT)
Date: Tue, 25 Sep 2012 13:47:09 -0700 (PDT)
From: Charles Law <charles....@openx.com>
To: protobuf@googlegroups.com
Message-Id: <9741b503-7a6f-4a38-b2b6-903d9e183ebf@googlegroups.com>
In-Reply-To: <cb1ddd05-7bd6-49e4-9be3-9739781efd9d@googlegroups.com>
References: <cb1ddd05-7bd6-49e4-9be3-9739781efd9d@googlegroups.com>
Subject: Re: Python3 Protobufs
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_171_17193664.1348606029775"
------=_Part_171_17193664.1348606029775
Content-Type: multipart/alternative;
boundary="----=_Part_172_3157304.1348606029775"
------=_Part_172_3157304.1348606029775
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
I thought about this a little, and realized that both unicode and str type
strings are passed into fields that have cpp_type CPP_STRING and field_type
TYPE_STRING. I know the 7-bit character limit is only imposed on str type
strings - all the extreme value tests use unicode strings. In Python3, all
strings are unicode, so should this limit only exist in Python 2.x?
On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
>
> I've made an attempt to create a Python3 compatible version of protobufs.
> I have some code that passes pretty much all the unit tests which I've
> posted here:
>
> https://github.com/openx/python3-protobuf
>
> I probably won't have a chance to look at this again for a couple weeks if
> not longer, so I want to get it out there. In my attempt I decided to
> follow the advice in another post, and treat python3 as a new language. To
> get python3 working, you'll have to compile the C code. There are also a
> few issues I ran into along the way:
>
> - I decided to use strings where unicode is used in Python 2. I was
> originally going to try to use bytes/bytearrays, but they do not support >8
> bit characters, and some of the setup.py tests use "exotic" 16 bit chrs.
> (Warning: I might have something conceptually wrong here)
> - There are places where byte data is stored as strings, then
> converted to unicode. I ended up converting strings (I called them
> bytestr's) to normal strings. I'm not sure this is done correctly
> everywhere though.
> - Data is packed/unpacked using struct.pack/unpack which is done using
> bytes instead of strings in Python3. I have simple string_to_bytes() and
> bytes_to_string() functions to do this.
>
>
> What's left is:
>
> - There are a couple Exceptions that I don't throw. They are supposed
> to be where the Python2 code converts from unicode strings to regular
> strings. I am definitely missing something conceptually here - I haven't
> figured out how Python 2x supports strings with "exotic" characters, but
> not strings like u'a\x80a'. If someone can solve this problem & figure out
> when to throw the exceptions Python3 will be *fully* working.
>
>
> I might have small bits of time here or there but I don't think I can
> devote the time I need to get this finished for several weeks, so if
> someone wants to finish this up, feel free to fork this code. If anyone
> wants to see what I did, the best way to do this is to diff between the
> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
> Thanks!
> Charles
>
------=_Part_172_3157304.1348606029775
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
I thought about this a little, and realized that both unicode and str type =
strings are passed into fields that have cpp_type CPP_STRING and field_type=
TYPE_STRING. I know the 7-bit character limit is only imposed on str=
type strings - all the extreme value tests use unicode strings. In P=
ython3, all strings are unicode, so should this limit only exist in Python =
2.x?<div><br><div><br>On Friday, September 21, 2012 6:16:41 PM UTC-7, Charl=
es Law wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-le=
ft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">I've made an atte=
mpt to create a Python3 compatible version of protobufs. I have some =
code that passes pretty much all the unit tests which I've posted here:<div=
><br></div><div><a href=3D"https://github.com/openx/python3-protobuf" targe=
t=3D"_blank">https://github.com/openx/<wbr>python3-protobuf</a><br></div><d=
iv><br></div><div>I probably won't have a chance to look at this again for =
a couple weeks if not longer, so I want to get it out there. In my at=
tempt I decided to follow the advice in another post, and treat python3 as =
a new language. To get python3 working, you'll have to compile the C =
code. There are also a few issues I ran into along the way:</div><div=
><ul><li><span style=3D"line-height:normal">I decided to use strings where =
unicode is used in Python 2. I was originally going to try to use byt=
es/bytearrays, but they do not support >8 bit characters, and some of th=
e setup.py tests use "exotic" 16 bit chrs. (Warning: I might have something=
conceptually wrong here)</span></li><li><span style=3D"line-height:normal"=
>There are places where byte data is stored as strings, then converted to u=
nicode. I ended up converting strings (I called them bytestr's) to no=
rmal strings. I'm not sure this is done correctly everywhere though.<=
/span></li><li><span style=3D"line-height:normal">Data is packed/unpacked u=
sing struct.pack/unpack which is done using bytes instead of strings in Pyt=
hon3. I have simple string_to_bytes() and bytes_to_string() functions=
to do this.</span></li></ul><div><br></div></div><div>What's left is:</div=
><div><ul><li><span style=3D"line-height:normal">There are a couple Excepti=
ons that I don't throw. They are supposed to be where the Python2 cod=
e converts from unicode strings to regular strings. I am definit=
ely missing something conceptually here - I haven't figured out how Py=
thon 2x supports strings with "exotic" characters, but not strings like u'a=
\x80a'. If someone can solve this problem & figure out when to th=
row the exceptions Python3 will be <b>fully</b> working.</span></li></ul><d=
iv><br></div><div>I might have small bits of time here or there but I don't=
think I can devote the time I need to get this finished for several weeks,=
so if someone wants to finish this up, feel free to fork this code. =
If anyone wants to see what I did, the best way to do this is to diff betwe=
en the latest commit and commit <wbr>49ccf5d8b3b688c335dc35bcb9f219<wb=
r>eca78c7210.</div></div><div>Thanks!</div><div>Charles</div></blockquote><=
/div></div>
------=_Part_172_3157304.1348606029775--
------=_Part_171_17193664.1348606029775--