Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Python3 Protobufs

Received: by 10.66.86.102 with SMTP id o6mr1172890paz.41.1348606033223;
        Tue, 25 Sep 2012 13:47:13 -0700 (PDT)
X-BeenThere: protobuf@googlegroups.com
Received: by 10.68.197.72 with SMTP id is8ls2340840pbc.5.gmail; Tue, 25 Sep
 2012 13:47:10 -0700 (PDT)
Received: by 10.68.219.198 with SMTP id pq6mr4129013pbc.0.1348606030225;
        Tue, 25 Sep 2012 13:47:10 -0700 (PDT)
Date: Tue, 25 Sep 2012 13:47:09 -0700 (PDT)
From: Charles Law <charles....@openx.com>
To: protobuf@googlegroups.com
Message-Id: <9741b503-7a6f-4a38-b2b6-903d9e183ebf@googlegroups.com>
In-Reply-To: <cb1ddd05-7bd6-49e4-9be3-9739781efd9d@googlegroups.com>
References: <cb1ddd05-7bd6-49e4-9be3-9739781efd9d@googlegroups.com>
Subject: Re: Python3 Protobufs
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_171_17193664.1348606029775"

------=_Part_171_17193664.1348606029775
Content-Type: multipart/alternative; 
	boundary="----=_Part_172_3157304.1348606029775"

------=_Part_172_3157304.1348606029775
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

I thought about this a little, and realized that both unicode and str type 
strings are passed into fields that have cpp_type CPP_STRING and field_type 
TYPE_STRING.  I know the 7-bit character limit is only imposed on str type 
strings - all the extreme value tests use unicode strings.  In Python3, all 
strings are unicode, so should this limit only exist in Python 2.x?


On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
>
> I've made an attempt to create a Python3 compatible version of protobufs. 
>  I have some code that passes pretty much all the unit tests which I've 
> posted here:
>
> https://github.com/openx/python3-protobuf
>
> I probably won't have a chance to look at this again for a couple weeks if 
> not longer, so I want to get it out there.  In my attempt I decided to 
> follow the advice in another post, and treat python3 as a new language.  To 
> get python3 working, you'll have to compile the C code.  There are also a 
> few issues I ran into along the way:
>
>    - I decided to use strings where unicode is used in Python 2.  I was 
>    originally going to try to use bytes/bytearrays, but they do not support >8 
>    bit characters, and some of the setup.py tests use "exotic" 16 bit chrs. 
>    (Warning: I might have something conceptually wrong here)
>    - There are places where byte data is stored as strings, then 
>    converted to unicode.  I ended up converting strings (I called them 
>    bytestr's) to normal strings.  I'm not sure this is done correctly 
>    everywhere though.
>    - Data is packed/unpacked using struct.pack/unpack which is done using 
>    bytes instead of strings in Python3.  I have simple string_to_bytes() and 
>    bytes_to_string() functions to do this.
>
>
> What's left is:
>
>    - There are a couple Exceptions that I don't throw.  They are supposed 
>    to be where the Python2 code converts from unicode strings to regular 
>    strings.  I am definitely missing something conceptually here - I haven't 
>    figured out how Python 2x supports strings with "exotic" characters, but 
>    not strings like u'a\x80a'.  If someone can solve this problem & figure out 
>    when to throw the exceptions Python3 will be *fully* working.
>
>
> I might have small bits of time here or there but I don't think I can 
> devote the time I need to get this finished for several weeks, so if 
> someone wants to finish this up, feel free to fork this code.  If anyone 
> wants to see what I did, the best way to do this is to diff between the 
> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
> Thanks!
> Charles
>

------=_Part_172_3157304.1348606029775
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

I thought about this a little, and realized that both unicode and str type =
strings are passed into fields that have cpp_type CPP_STRING and field_type=
 TYPE_STRING. &nbsp;I know the 7-bit character limit is only imposed on str=
 type strings - all the extreme value tests use unicode strings. &nbsp;In P=
ython3, all strings are unicode, so should this limit only exist in Python =
2.x?<div><br><div><br>On Friday, September 21, 2012 6:16:41 PM UTC-7, Charl=
es Law wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-le=
ft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">I've made an atte=
mpt to create a Python3 compatible version of protobufs. &nbsp;I have some =
code that passes pretty much all the unit tests which I've posted here:<div=
><br></div><div><a href=3D"https://github.com/openx/python3-protobuf" targe=
t=3D"_blank">https://github.com/openx/<wbr>python3-protobuf</a><br></div><d=
iv><br></div><div>I probably won't have a chance to look at this again for =
a couple weeks if not longer, so I want to get it out there. &nbsp;In my at=
tempt I decided to follow the advice in another post, and treat python3 as =
a new language. &nbsp;To get python3 working, you'll have to compile the C =
code. &nbsp;There are also a few issues I ran into along the way:</div><div=
><ul><li><span style=3D"line-height:normal">I decided to use strings where =
unicode is used in Python 2. &nbsp;I was originally going to try to use byt=
es/bytearrays, but they do not support &gt;8 bit characters, and some of th=
e setup.py tests use "exotic" 16 bit chrs. (Warning: I might have something=
 conceptually wrong here)</span></li><li><span style=3D"line-height:normal"=
>There are places where byte data is stored as strings, then converted to u=
nicode. &nbsp;I ended up converting strings (I called them bytestr's) to no=
rmal strings. &nbsp;I'm not sure this is done correctly everywhere though.<=
/span></li><li><span style=3D"line-height:normal">Data is packed/unpacked u=
sing struct.pack/unpack which is done using bytes instead of strings in Pyt=
hon3. &nbsp;I have simple string_to_bytes() and bytes_to_string() functions=
 to do this.</span></li></ul><div><br></div></div><div>What's left is:</div=
><div><ul><li><span style=3D"line-height:normal">There are a couple Excepti=
ons that I don't throw. &nbsp;They are supposed to be where the Python2 cod=
e converts from unicode strings to regular strings. &nbsp;I am&nbsp;definit=
ely&nbsp;missing something conceptually here - I haven't figured out how Py=
thon 2x supports strings with "exotic" characters, but not strings like u'a=
\x80a'. &nbsp;If someone can solve this problem &amp; figure out when to th=
row the exceptions Python3 will be <b>fully</b> working.</span></li></ul><d=
iv><br></div><div>I might have small bits of time here or there but I don't=
 think I can devote the time I need to get this finished for several weeks,=
 so if someone wants to finish this up, feel free to fork this code. &nbsp;=
If anyone wants to see what I did, the best way to do this is to diff betwe=
en the latest commit and commit&nbsp;<wbr>49ccf5d8b3b688c335dc35bcb9f219<wb=
r>eca78c7210.</div></div><div>Thanks!</div><div>Charles</div></blockquote><=
/div></div>
------=_Part_172_3157304.1348606029775--

------=_Part_171_17193664.1348606029775--