In Felixes experimental Babel Py3 repo I just committed a py3 support
change.
It uses 2to3 when setup.py is run to convert most everything, certainly
all of the syntactic stuff seems to be converted OK. The things that
remain are changes in functionality or missing features from 2.x to 3.x.
A few manual fixes identified by felix are added, but to contain them
and minimise in-tree changes, i added a py2compt.py module which at the
moment holds dictmixin and a replacement for the 2.x builtin cmp, both
of which are missing from py3.
The idea was to keep the code as similar as possible, and to contain
these larger chunks together, to aid readability.
It all builds fine under 2.6, 2.7 and 3.2. Its late here and I haven't
run any tests (im not really sure how to run the tests actually). But I
wanted to commit this so others could look at it and comment on if this
is the right approach or not.
This is more of a structure patch than a "this will make it work for
sure on py3 patch."
Strontium
I haven't looked at the code yet, but have you guys added a local fixers
file for this? Distribute has a wonderful mechanism for this stuff.
>It all builds fine under 2.6, 2.7 and 3.2. Its late here and I haven't
>run any tests (im not really sure how to run the tests actually). But I
>wanted to commit this so others could look at it and comment on if this
>is the right approach or not.
Have you tested scripts/import_cldr.py? That was the problem I was trying to
locally solve, since it imports from the babel source repository directly.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | GPG: 2EAC625B
The great man is he who does not lose his childlike heart...
Now i'm a complete Babel noob, so bear with me. It seems that
import_cldr is only used for people who checkout source from svn, correct?
Well it seems we have two choices.
1. Spend a lot of time writing some custom "make a temporary py3 version
from import_cldr and its dependencies" code. OR
2. For the time being specify that import_cldr and the other scripts are
Python2 scripts and if you are using svn, and need to run them you need
Python 2.x Which to my mind doesn't seem like much of an impost for
someone using the svn version.
From my messing around, this would not preclude one from then building
Babel using Python3 as the scripts purpose is only to massage a bunch of
data for Babels later use.
Strontium
python setup.py test
> Have you tested scripts/import_cldr.py? That was the problem I was trying to
> locally solve, since it imports from the babel source repository directly.
What's the exact problem? Is it that distutils/distribute does not apply
2to3 on that file as it is not copied in the egg?
fs
Am 24.03.2011 15:59, schrieb Strontium:
> It all builds fine under 2.6, 2.7 and 3.2. Its late here and I haven't
> run any tests (im not really sure how to run the tests actually). But I
> wanted to commit this so others could look at it and comment on if this
> is the right approach or not.
Thank you very much for your commits. I really like most of these
changes :-)
For the cmp patch I'd like to propose that we add new-style comparison
methods like __eq__ which fall back to __cmp__. That way we don't have
to ship an implementation of cmp. Also (AFAIK) the __cmp__ protocol is
not used anymore in Python 3 so we need to add these methods anyway.
fs
Its a catch 22.
I started to look at the possibility of another script that 2to3'd the
scripts and then the parts of the main source that needed it (copying it
to a temp directory) but import_cldr expects to be executing from where
it is, because it trys to insert the processed CLDR data back into the
main tree using file system locations relative to the scripts
directory. Its quite a curly problem, and given it only effects
developers using SVN, I figured time was better spent actually getting
Babel to work under Python 3. I think a more elegant solution would be
for setup to auto process the cldr data if it isn't present, and 2to3
that source (somehow). But I haven't the foggiest how one would achieve
that. I also considered promoting the import_cldr functionality to be
part of pybabel, but the problem with that is to properly install
pybabel you need the processed cldr data, again a catch 22.
Strontium
try:
from UserDict import DictMixin
except ImportError:
from collections import UserDict as DictMixin
and then
class LocaleDataDict(DictMixin, dict):
"""Dictionary wrapper that automatically resolves aliases to the actual
values.
"""
def __init__(self, data, base=None):
dict.__init__(self, data)
if sys.version_info >= (3, 0):
DictMixin.__init__(self,data)
Im wrangling with the test suite at the moment, and when I have some
confidence this is actually working properly ill submit it.
Regarding the testsuite, docstring tests are broken from py2 to py3.
And its not easy or straight forward to fix them. Armin on the Jinja2
port said "There is a doctest converter in 2to3, but it does not give
you much. Error messages changed, reprs changed which it cannot properly
pick up, nested tracebacks cause a lot of grief and they are hard to debug."
Id agree with that, I have started following his advice, namely,
disabling docstring tests for Py3 (but leaving them for py2), and adding
actual unittest test cases to replace them (for both py2 and py3). I
spent most of the night wrangling with DictMixin, and seem to have that
sorted, so I should be able to get a few test cases knocked over tonight
cause that was causing heaps of problems. Once I have some confidence
what I've got is good, ill commit it, so you can critique the
direction. At the moment i've got lots of debug code in there to test
the testsuite, and we don't need that, so I cant commit right now.
Strontium
There's also the reasoning behind that user's don't expect that
python setup.py …
changes the actual source code.
fs
Am 27.03.2011 05:14, schrieb Strontium:
> I also have an uncommitted patch
> on my tree that gets rid of the copied DictMixin code. So getting rid
> of cmp would get rid of the py2compat.py file i added, which i'm not too
> thrilled about. A Better solution was to :
>
> try:
> from UserDict import DictMixin
> except ImportError:
> from collections import UserDict as DictMixin
If this works, I'd really glad. I just remembered that it didn't when I
ran the tests but that might have been also because of a separate problem.
> Regarding the testsuite, docstring tests are broken from py2 to py3.
> And its not easy or straight forward to fix them. Armin on the Jinja2
> port said "There is a doctest converter in 2to3, but it does not give
> you much. Error messages changed, reprs changed which it cannot properly
> pick up, nested tracebacks cause a lot of grief and they are hard to
> debug."
The problem with the doctest converter is that it can't change any
"output", it only changes code. Therefore all u'' in doctests cause
problems in Python 3.
General (community) advice seems to be to get rid of doctests. I think
the only legitimate usage of doctests is to very if example code in
documentation still works.
Testing functionality should be done in a proper unittest as a general rule.
Therefore I'm okay with disabling doctests for Python3 if we mention
this limitation in the docs. We can fix that later, the main value is in
a working Python 3 version of Babel.
fs
I have committed a change the passes all unittests for core.py and adds
a bunch of new unit tests replacing the doctests that are now disabled
for Py3.
Core.py now passes all tests under Py2.6,2.7 and 3.2. There's a bunch
of other tests that pass, but also some really hideous breakages, so
unless i've worked through a module fully and added back the tests that
were once doctests, I'm not declaring those other modules as passing.
Basically my strategy is to take it one module at a time, run tests
under 2.6, add unittests to replace doctests on the current module, make
sure it passes all tests under the 3 pythons im testing with, move on to
the next one. The next module I am going to tackle is dates.py.
I had a weird problem with repr() which I am using in the replacement
doctests, in 2.6 and 2.7.
The line is:
self.assertEqual(repr(Locale('en', 'US').currency_formats[None])
,'<NumberPattern %s\'\xa4#,##0.00\'>' % py2u)
py2u is just 'u' on Python 2 and '' on Python 3, To fix up the strings.
on Py2 the repr generates: "<NumberPattern u'\\xa4#,##0.00'>" which
seems wrong, and the assertEqual fails on Py2.6 and 2.7. However it
passes OK on Py3.
I worked around with a lesser test for Py2 of:
self.assertEqual(Locale('en', 'US').currency_formats[None].pattern,
u'\xa4#,##0.00' )
which works, but I am at a loss to explain why the first line doesn't
work for Py2, so if anyone can educate me on where I am going wrong, i'd
appreciate it.
Strontium
util.odict did not run under py3.2, but py3.1+ has a native ordered dict
which according to the pep which defined it gained some inspiration from
babels odict, so i changed util.py to subclass collections.OrderedDict
for odict, for Python 3.1 and up. It seems to work fine. I subclassed
it in case some small tweaks were needed to the api to make it
compaitble fully with the py2 odict implementation, so far that hasn't
proved necessary, if by the end of the port it proves that
collection.OrderedDict is a complete replacement for odict then i will
change the imports for odict to be conditional on the python version and
just "import collection.OrderedDict as odict" as its more straight
forward. Unless of course people would prefer it kept the way i've done
it at the moment.
I dont know if odict works on Python 3.0 but I remember reading that
Python 3.0 is considered broken by everyone anyway, so i dont know if
thats an issue or not.
so far the port is progressing smoothly. I am expecting the test cases
to be complete by the end of next week, unless something really horrible
crops up, or real life gets in the way.
Strontium
thanks for your work. IMHO we should not bother supporting Python 3.0. I
don't think there are a lot of users for py3k anyway, so let's not
complicate our code here.
fs
So far, most porting changes have been mechanical in nature and pretty
straight forward once i wrap my head around them. I am on the hardest
bit now. PO Files.
MO Files are easy, because they have to be Binary, and I seem to have
that working for py3k.
PO Files, look like text files, but to py3.x they act something like
binary files, because in theory, their encoding can change mid stream.
(it seems to me) Because it seems you could start reading as (say)
utf-8 and then read the mime header and change encodings after the
header has been read and "Content-Type:" processed. But I dont see how
Babel handles that, if it does.
Now, if PO Files are text files and opened with a particular encoding,
then things are easy, but at the moment, no PO Files are opened with any
particular encoding, so what I am asking is, should I make the PO File
handling Binary like and massage each and every lines encoding manually,
or do we say for Py3, the fileobj must be opened with the particular
encoding required and then I just add some check to the beginning of
read_po and write_po for py3 to ensure the file obj mode is correct AND
the encoding matches catalog.charset.
My preference is the latter (require po file objects to be opened with
the correct encoding), because it easier. For utf-8 encoded po files i
have it all SEEMS to be working, its breaking at the moment handling
iso-8859-1 encoded po files in the test suite.
Some advice in this regard would be appreciated, before i blunder off
and make a huge mistake.
Strontium
Am 05.04.2011 18:23, schrieb Strontium:
> PO Files, look like text files, but to py3.x they act something like
> binary files, because in theory, their encoding can change mid stream.
> (it seems to me) Because it seems you could start reading as (say)
> utf-8 and then read the mime header and change encodings after the
> header has been read and "Content-Type:" processed. But I dont see how
> Babel handles that, if it does.
It does. See http://babel.edgewall.org/ticket/255
> Now, if PO Files are text files and opened with a particular encoding,
> then things are easy, but at the moment, no PO Files are opened with any
> particular encoding, so what I am asking is, should I make the PO File
> handling Binary like and massage each and every lines encoding manually,
> or do we say for Py3, the fileobj must be opened with the particular
> encoding required and then I just add some check to the beginning of
> read_po and write_po for py3 to ensure the file obj mode is correct AND
> the encoding matches catalog.charset.
>
> My preference is the latter (require po file objects to be opened with
> the correct encoding), because it easier.
While I think that most po files will be in UTF-8 now but in order to
support po files fully, I think we should go the "right" (aka hard) way.
IMHO the limitation mentioned in #255 can stay until we fix it but we
should support ISO-8559-1 po files. I thought I handled that in my old
patches already.
However if you like you can also do it the easy way first and let
someone else (e.g. me) fix the thing for real later.
Btw: I'm currently working on getting a Python 3 bitten build slave
running for another project of mine. When this is done, I'll continue
working on the Python3 version of Babel :-)
fs
On a bit of a tangent: I have had several problems with Babel breaking
PO-files during processing, either due to bugs in wrapping or escaping.
polib (http://pypi.python.org/pypi/polib) gave me much better results
and appears to see significant uptake, so I am wondering if Babel should
start using polib instead of having its own po/mo implementation.
Wichert.
--
Wichert Akkerman <wic...@wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
write_po and read_po are only used in anger by frontend.py and they all
are variants of:
outfile = open(self.output_file, 'w')
try:
write_po(outfile, catalog)
finally:
outfile.close()
when with my proposal above you would just have:
write_po(self.output_file, catalog)
and
read_po(infile, locale)
The biggest issue for the Babel codebase with doing this are the tests,
as they use StringIO but i would prefer to remove StringIO from the
tests for write_po and read_po to have these functions work good for
py2.x and 3.x than support the use of StringIO, as I struggle to see a
purpose for that, outside the tests.
As for polib, that might be a better way to go eventually, but for this
effort (making babel work on py3) polib itself does not support py3 so
it would have to be ported first. :(
polib works just like i propose here, it takes file names and handles
files internally.
> While I think that most po files will be in UTF-8 now but in order to
> support po files fully, I think we should go the "right" (aka hard) way.
> IMHO the limitation mentioned in #255 can stay until we fix it but we
> should support ISO-8559-1 po files. I thought I handled that in my old
> patches already.
If im going to mess with read_po i will try and fix this as well.
> However if you like you can also do it the easy way first and let
> someone else (e.g. me) fix the thing for real later.
not a fan of doing things twice :)
> Btw: I'm currently working on getting a Python 3 bitten build slave
> running for another project of mine. When this is done, I'll continue
> working on the Python3 version of Babel :-)
cool.
strontium
The problem lies in the fact that you get yet another dependency for Babel.
On the other hand, holding onto Not Invented Here (NIH) is also not
productive.
Need to think some more on this.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | GPG: 2EAC625B
Atone me to my throes curtail...
> -On [20110405 22:26], Wichert Akkerman (wic...@wiggy.net) wrote:
>> On a bit of a tangent: I have had several problems with Babel breaking
>> PO-files during processing, either due to bugs in wrapping or escaping.
>> polib (http://pypi.python.org/pypi/polib) gave me much better results
>> and appears to see significant uptake, so I am wondering if Babel should
>> start using polib instead of having its own po/mo implementation.
>
> The problem lies in the fact that you get yet another dependency for Babel.
> On the other hand, holding onto Not Invented Here (NIH) is also not
> productive.
>
> Need to think some more on this.
Does it support Python 3?
--
Philip Jenvey
No, I don't think so.
Also the docs say: "polib requires python 2.5 or higher."
To me at least Python 2.4 support is really important for the next 2-3
years (while RHEL 5 is still actively maintained). But I guess given
enough interest we could fix that.
fs
Disclaimer: I never worked with polib, just looked at the source for 10
minutes.
To me dependencies are not such a bad thing as long as the project
understands the difference between development and distribution:
- For developers all kind of dependencies are fine. Even quite
complicated dependencies are ok.
- As a user I want to have as few dependencies as possible because I
don't know how to install them, things might go wrong etc.
Therefore you could include a dependency in your distributed files while
falling back to a system wide lib if it is not included. This also helps
linux distributions with their "no bundling" policy.
It's evil though to have your dependencies included in your source (like
Django, twill, …). Bundling dependencies is purely for distributions.
Don't know what to do about polib though. ;-)
fs
The reason I suggested polib is that the Babel po-file implementation
has too many bugs which kept breaking our po-files, and Babel
development was effectively stalled. I was about to start writing my own
thing when I ran into polib which appears to be actively maintained and
did not suffer from any of the problems I have ran into with Babel, so
switching was a simple choice for me.
I am not saying polib is ideal; its documentation certainly leaves a lot
to be desired. But my gut feeling is that an effort to improve polib is
more worthwile than maintaining a separate po-implementation in Babel.
I am more thinking of use cases like the Trac project. There was already
some grumbling about pytz, so imagine if we add polib.
I guess asking David if it's ok to drop polib.py into Babel's source also
kind of defeats the whole purpose.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | GPG: 2EAC625B
I believe because it is impossible...
Why are they so afraid of dependencies? Most of the applications I
install have dozens of external dependencies and using standard tools
such as virtualenv of zc.buildout is really is trivial to manage them,
so I do not quite understand their worry. Especially if there is a
choice between having to do more work yourself when you already don't
have enough manpower versus getting things for free.
Wichert.
I think one of the reasons why Django is so successful is because they
don't have dependencies. It certainly contributes to Tracs popularity as
well.
When I worked for a company developing a popular Trac plugin, we had a
lot of issues with users not being able to install all dependencies.
These guys were not software developers or linux admins, many of them
use Windows and all that crap.
> Most of the applications I
> install have dozens of external dependencies and using standard tools
> such as virtualenv of zc.buildout is really is trivial to manage them,
I strongly disagree here. It might be easy to setup (not trivial, think
of pypi being down, incompatible requirements because one of your
dependencies was updated etc) but IMHO it's a PITA to maintain:
- easy_install, pip etc really suck compared to yum/aptitude
- no mirror network with auto-failover for pypi
- hard to get latest (security) updates for packages installed in
virtualenv
- no stable set of versions where you can get security fixes and
simple bug fixes but no API/ABI breakage for a certain time frame
(think CentOS/RHEL - 7+ years)
So virtualenv is nice for single deployments but if you want to maintain
stuff on a bigger scale (e.g. dozens of virtualenvs, lots of servers,
unskilled users) with minimum admin resources, it definitively not easy
to manage.
fs
Interestingly enough the tide is turning there, and Django is now
starting to be broken up in smaller pieces.
> I strongly disagree here. It might be easy to setup (not trivial, think
> of pypi being down, incompatible requirements because one of your
> dependencies was updated etc) but IMHO it's a PITA to maintain:
> - easy_install, pip etc really suck compared to yum/aptitude
> - no mirror network with auto-failover for pypi
> - hard to get latest (security) updates for packages installed in
> virtualenv
> - no stable set of versions where you can get security fixes and
> simple bug fixes but no API/ABI breakage for a certain time frame
> (think CentOS/RHEL - 7+ years)
All of these are just as problematic for a single package as for a
dozen. If you need to deploy to multiple machines tools like pip bundles
or zc.buildout allow you to completely lock down all versions and give
you an easily reproducible deployment.
Wichert.
I can see plenty of purpose for using StringIO for po files, and have done so fairly often - you maybe passing around an in-memory copy of a PO file and want to change it; you may be grabbing it off the web; etc etc
Just thought I'd mention it :)
Cheers
David
FWIW polib uses a slightly odd pattern for this: you pass a string to
its pofile method, and if os.path.exists(input) returns True it assumes
it to be a filename, and otherwise it assumes it is the raw content of a
PO or MO files.
Wichert.
I think this is a classical case of the Python2 unicode problem (no real
separation for binary and ASCII string data). In Python3 all of these
methods should take ByteIO streams.
fs
Not sure if I understand you completely but yes, I think as far as Babel
is concerned, we should treat all po files as binary until we parsed the
encoding. In Python 2 we only have StringIO for that but otherwise it
really should be BytesIO.
fs
Python 2.6 and later do have BytesIO.
I apologise for the size of the patch, but when I got to frontend.py
everything unravelled as it was all interconnected and i've only just
gotten it all back together by fixing pofile and mofile handling.
I need to go through and make sure all doc tests are ported to unit
tests, but at this stage it looks like it should all work, at least as
far as the unit tests are concerned.
I made a BIG upgrade to the functionality of pofile reading.
AND I closed http://babel.edgewall.org/ticket/255
PO Files for Python 2.x can be read from:
Files. Alternately a file name passed in will automatically be opened
and read, or if its not a file but a string containing a pofile's
contents that too will be read and processed.
It will Automatically read content type encoding from the file and use
that, falling back to the encoding set in the catalogue if its not set.
PO Files for Python 3.x can be read from:
Text Files/TextIO - limited to the encoding the text file is opened in
(usually utf-8). This is a Py3 limitation as I can't find any way to
re-open a file with a different encoding. Well actually for Text files I
could do something like:
filename = getattr(fileobj, 'name', '')
fileobj = open(filename,'rb')
but that seems a little perverse. If the consensus is that would be a
good thing I can implement it easy enough. then text files will behave
just like binary files and only TextIO will be limited to utf-8.
Binary Files/BytesIO/binary string - detects the encoding and uses that,
or defaults to catalog encoding if not found.
Text String - If its a file name, that file is automatically opened and
read, like a binary file. otherwise the text string is processed like a
text file.
Writing is a little more restricted.
PO File writing for Py2.x is essentially unchanged.
for Py3.x:
TextFiles/TextIO are forced to use the encoding specified when the file
was opened. (usually utf-8)
Binary files/ByteIO are encoded in the encoding set in the catalog.
Again I could possibly use the get file name and re-open trick above to
coerce the text file into binary mode, but that seems even more
perverse, given the file is already open for write and we don't really
know what the caller will do with the original file handle after the call.
For MoFiles, on py3 the fileobj must be a binary file. I could make
read_mo more forgiving using the re-open trick above, but i dont think
its worth it, and seems very hacky.
So at the moment, i am very confident this is a working Babel for Py3
and it is exactly the same source tree for both versions. I added a
couple of custom fixers to fix up some decelerations, its possible that
some of my "if py 3 do this, else do that" paths could be replaced with
custom fixers, but im hardly an expert at writing fixers (which seems
like a black art) and im not sure it would be worth it. I implemented
the ones i've added because there was a few of them and the manual
change was hugely messy.
So at this stage, i'd love some constructive criticism, cause i think
the work is close to done.
Strontium