Python 3 (again)

243 views
Skip to first unread message

Mikhail Korobov

unread,
May 18, 2012, 8:48:48 AM5/18/12
to nltk...@googlegroups.com
Hello NLTK developers,

Since nltk 2 final is out, I've started nltk 2+3 port. See: https://github.com/kmike/nltk/commits/2and3

The idea is to support python 2.6, 2.7 and 3.2 using the same codebase. A bit of background: some major ibraries (django, webob, pyramid) were ported using this approach; I myself have ported several small python libraries over this year using this approach and found it good. The NLTK port is not ready yet and it doesn't run under python 3 at the moment, but python 2 tests have the same failures as in master branch so I hope I didn't broke anything for python 2. The plan is to update existing 2.x code to use more recent idioms and then add python 3 support. Python 2.5 support is already dropped in this branch to make transition easier and code better. First several commits are "cowboy commits" - I was fixing random things in random modules, sorry for this. Later commits use better approach: they are fixing certain aspects of NLTK code, e.g. 'reduce' builtin that is removed in py3k or the changed iteration protocol. 

I'm utilizing the excellent python-modernize utility (https://github.com/mitsuhiko/python-modernize) to find and fix some 2+3 issues, running its fixers one-by-one. It doesn't make code 2+3 compatible automatically, doesn't find all incompatibilities and is not always correct, but is still very useful helper to automate tedious search/replace tasks. 

I don't see how the porting work can be parallelized right now. Please wait a couple of days :) But once the library gets updated to the point py32 tests run there is a need for a lot of help to actually make sure NLTK works under python 3 and python 2. Code review, extra test coverage, fixing failing tests and checking if updated NLTK works with your codebase is very welcome! 

Lars Buitinck

unread,
May 18, 2012, 8:53:13 AM5/18/12
to nltk...@googlegroups.com
2012/5/18 Mikhail Korobov <kmi...@googlemail.com>:
> The idea is to support python 2.6, 2.7 and 3.2 using the same codebase. A
> bit of background: some major ibraries (django, webob, pyramid) were ported
> using this approach; I myself have ported several small python libraries
> over this year using this approach and found it good. The NLTK port is not
> ready yet and it doesn't run under python 3 at the moment, but python 2
> tests have the same failures as in master branch so I hope I didn't broke
> anything for python 2. The plan is to update existing 2.x code to use more
> recent idioms and then add python 3 support. Python 2.5 support is already
> dropped in this branch to make transition easier and code better. First
> several commits are "cowboy commits" - I was fixing random things in random
> modules, sorry for this. Later commits use better approach: they are fixing
> certain aspects of NLTK code, e.g. 'reduce' builtin that is removed in py3k
> or the changed iteration protocol.

Did you consider using six [1]? That's a small enough library to
package with NLTK.

Lars Buitinck

unread,
May 18, 2012, 8:59:26 AM5/18/12
to nltk...@googlegroups.com
2012/5/18 Lars Buitinck <lars...@gmail.com>:
> Did you consider using six [1]? That's a small enough library to
> package with NLTK.

... where [1] was supposed to refer to http://packages.python.org/six/

Mikhail Korobov

unread,
May 18, 2012, 9:20:36 AM5/18/12
to nltk...@googlegroups.com
I've copied a small part of six to nltk.compat ( https://github.com/kmike/nltk/blob/2and3/nltk/compat.py ). The whole library seems unnecessary (it contain some extra utilities e.g. for 2.4 and 2.5 compatibility) + if I'm not mistaken 'moves' from six doesn't work if six is bundled to a subpackage (e.g. nltk.six) because of implementation details.

пятница, 18 мая 2012 г., 18:59:26 UTC+6 пользователь Lars Buitinck написал:

Mikhail Korobov

unread,
May 18, 2012, 9:25:44 AM5/18/12
to nltk...@googlegroups.com
.. and packaging it to a top-level module seems wrong because it may mess with six installed from pypi.

пятница, 18 мая 2012 г., 19:20:36 UTC+6 пользователь Mikhail Korobov написал:

Mikhail Korobov

unread,
May 18, 2012, 4:41:21 PM5/18/12
to nltk...@googlegroups.com
Update: nltk passes some tests under python 3 now. Nose is unable to run full test suite under python 3 (it raises strange exception), but individual tests can be executed. This:

tox -- nltk.internals nltk.misc.sort nltk.util internals.doctest util.doctest simple.doctest

passes under python 2.6, 2.7, 3.2 and pypy 1.8. There is a lot of work remaining; unicode handling seems to be the major issue that may require backwards incompatible changes (current changes should be backwards compatible except for python 2.5 support). I won't be able to continue work on this until next week; feel free to snap up the development.

пятница, 18 мая 2012 г., 18:48:48 UTC+6 пользователь Mikhail Korobov написал:

Steven Bird

unread,
May 19, 2012, 1:07:35 AM5/19/12
to nltk...@googlegroups.com
Mikhail et al,

Thanks for returning to this topic -- support for Python 3 is long overdue!  I'm happy for the approach to be decided by those who are doing the work.  I'm in PNG running a workshop (http://www.boldpng.info/iwlp) followed by fieldwork, and not able to do much on NLTK until late June.

-Steven

Mikhail Korobov

unread,
May 23, 2012, 3:01:17 PM5/23/12
to nltk...@googlegroups.com
Thanks for the carte blanche :)

Update: NLTK test suite runs with this outcome under Python 3 on my computer:

Ran 157 tests in 8.187s
FAILED (errors=17, failures=65)

Run it with 

tox -e py32 -- -v --exclude=parse.doctest

because parse.doctest either takes ages to complete or has an infinite loop under Python 3. The exact errors/failures count may depend on installed software (mallet binary, etc.). 47% passing tests seems like a good progress but NLTK has a plenty of new bugs under Python 3 and there is a lot more to do: a lot of passing tests are passing because they are trivial, remaining failures are harder; there are parts of NLTK that are not covered by tests and there are skipped tests (under both python 2 and 3). Help is appreciated as always.

For the reference, python 2.6 tests on the same machine (I have some components missing + there are existing bugs in NLTK under python 2):

Ran 148 tests in 147.908s
FAILED (errors=1, failures=31)

pypy 1.8 results:

Ran 156 tests in 56.829s
FAILED (errors=15, failures=35) 


суббота, 19 мая 2012 г., 11:07:35 UTC+6 пользователь Steven Bird написал:
Reply all
Reply to author
Forward
0 new messages