NLTK now on GitHub (with Python 3 branch). Hopefully rewriting Python 2 code for 2to3.

69 views
Skip to first unread message

Lucas

unread,
Sep 23, 2011, 11:46:06 AM9/23/11
to nltk...@googlegroups.com
Hey guys,

Just so you know, I've created and populated an NLTK GitHub repo located at https://github.com/bobobo1618/NLTK. All my work on porting NLTK to Python 3 is sitting on a branch there and that's where I'll mainly be updating. I'll still pay attention to the Google Code issue tracker and still commit to the normal SVN branch occasionally but the GitHub branch is where I'll be doing most of my work. If you also like GitHub and want to submit Pull requests, I'll commit the results to SVN as well.
If there are people who like Mercurial, I recommend the HG-Git plugin which is actually what I'm using for the GitHub repo anyway.

Also, as part of the Python 3 branch, I'm intending to rewrite a large portion of the Python 2 code so that 2to3 can run over it without problems. This will allow seamless distribution with Distribute with only one code base. I'll probably make a separate branch so that I don't interfere with anyone else.


Anyone have any ideas or comments?

Lucas

unread,
Sep 23, 2011, 11:52:54 AM9/23/11
to nltk...@googlegroups.com
Oh, one last thing, tomorrow I'm going to be doing some major stuff to the GitHub repo (bringing in SVN history and branches) so perhaps it would be best if people left it alone until then?

Steven Bird

unread,
Sep 23, 2011, 3:52:13 PM9/23/11
to nltk...@googlegroups.com
On 23 September 2011 05:46, Lucas <bobob...@gmail.com> wrote:
> Just so you know, I've created and populated an NLTK GitHub repo located
> at https://github.com/bobobo1618/NLTK.

Can I suggest that you instead use this repo? It can be the official
github-hosted NLTK repository, assuming we find a way to regularly
synchronize it with the svn repository.

https://github.com/nltk/

Let me know offline what

> Also, as part of the Python 3 branch, I'm intending to rewrite a large
> portion of the Python 2 code so that 2to3 can run over it without problems.

This is a great idea. What kinds of changes do you anticipate needing to make?

> This will allow seamless distribution with Distribute with only one code base.

The current distribution setup isn't great, so I would welcome this,
particularly if it streamlines the process of building distributions
for different platforms.

Another systemic issue we might be able to deal with now is the
cascading imports / slow loading problem:
http://code.google.com/p/nltk/issues/detail?id=378

-Steven Bird

Lucas

unread,
Sep 23, 2011, 7:56:40 PM9/23/11
to nltk...@googlegroups.com
Could you make me a contributor/admin of the official repo? It seems a bit empty right now...
It turns out that I can import (and push) SVN repositories with Mercurial which also works with Git. So I'll be able to import the entire SVN repo with branches and history hopefully. If not, I'll at least have trunk with history.

The main changes I'll have to make are some little things like print statements and handling of strings I think.

And yeah... The current distribution setup doesn't even include the PyYAML dependency so all PyPi installs are dead unless someone just happens to have it installed.

Something else I was wondering, why is all the version, author and overall package information in the nltk __init__.py file? It makes sense to have the version info there along with some other basic stuff but I think a lot of what's there would be better off in the setup.py script. I also think it isn't such a good idea to have the setup script import the package... I might be completely wrong though, I'm quite new to distributing Python modules.

anand jeyahar

unread,
Sep 24, 2011, 2:46:01 AM9/24/11
to nltk...@googlegroups.com

Can I suggest that you instead use this repo?  It can be the official
github-hosted NLTK repository, assuming we find a way to regularly
synchronize it with the svn repository.

https://github.com/nltk/




I went through this Q&A and think the simplest one(Answer one above last) is to just create a git repo on the svn checkout folder.  
The metadata can be avoided by having .gitignore to ignore.svn and svn ignore for .git files. 

Only thing is you have to do both svn commit and git commit + git push. 

And the admin* work might be painful initially with the github pull requests and replicating them to the svn, with the right commit messages.



Another systemic issue we might be able to deal with now is the
cascading imports / slow loading problem:
http://code.google.com/p/nltk/issues/detail?id=378


I will look into this one and figure out a way.


*- I would volunteer, but i don't know much of the developer community, and haven't done any contribution so i shouldn't.

Thanks and Regards
Anand Jeyahar
https://github.com/anandjeyahar
==============================================
The man who is really serious,
with the urge to find out what truth is,
has no style at all. He lives only in what is.
                  ~Bruce Lee

Love is a trade with lousy accounting policies.
                 ~Aang Jie

Joel Nothman

unread,
Sep 24, 2011, 5:42:37 AM9/24/11
to nltk...@googlegroups.com, Steven Bird
If git is in the future of NLTK, it's best off importing the existing SVN
commit history using git svn:

$ mkdir $gitrepo
$ cd $gitrepo
$ git svn init $svnurl --no-metadata
$ git svn fetch

You should then be able to rebase your more recent commits atop the SVN,
Lucas.

And 2to3 stuff should be done in a branch, I presume...?

~J

Lucas

unread,
Sep 24, 2011, 10:29:41 AM9/24/11
to nltk...@googlegroups.com, Steven Bird
I'm using hgsubversion and hggit. All history and such is working, metadata works too. If steven exports the user list with emails I can even keep author data intact.

Lucas

unread,
Sep 24, 2011, 10:34:15 AM9/24/11
to nltk...@googlegroups.com, Steven Bird
The idea of the 2to3 stuff is that the main code base runs under Python 2, but is programmed in such a way that 2to3 can easily convert it to code that runs flawlessly under Python 3.

I think I might start in a branch but it will be merged with master very soon.

Morten Minde Neergaard

unread,
Sep 24, 2011, 11:11:29 AM9/24/11
to nltk...@googlegroups.com
At 09:52, Fri 2011-09-23, Steven Bird wrote:
> On 23 September 2011 05:46, Lucas <bobob...@gmail.com> wrote:
> > Just so you know, I've created and populated an NLTK GitHub repo located
> > at https://github.com/bobobo1618/NLTK.
>
> Can I suggest that you instead use this repo? It can be the official
> github-hosted NLTK repository, assuming we find a way to regularly
> synchronize it with the svn repository.
>
> https://github.com/nltk/

I have the following list of suggestions:

* Delete the NLTK _user_ and recreate it as an _organization_
- https://github.com/account/organizations/new
* Create a repo called nltk under nltk, at least with a first commit
- That way people can use the github «Fork» button to create forks
* All forks submit code to this repo using the «Pull request» feature
- Admin members of the organization accept or reject the requests
* The «Issues» feature should be deactivated on github
* Initial checkin should be git-svn imported
* All further commits and merges should also be handled using git-svn

I'd be more than happy to discuss or help with any part of this process.
The easiest way to reach me would be this email or on IRC - nick xim on
#nltk on Freenode.

Kind regards,
--
Morten Minde Neergaard

Lucas

unread,
Sep 24, 2011, 5:53:29 PM9/24/11
to nltk...@googlegroups.com
Sounds good :)

We have an IRC channel? o.O

Steven Bird

unread,
Sep 27, 2011, 8:03:32 PM9/27/11
to nltk...@googlegroups.com
Thanks for the suggestion Morten. I will go ahead and change the nltk
user into an organisation.

On 25 September 2011 01:11, Morten Minde Neergaard <x...@8d.no> wrote:
>
> I have the following list of suggestions:
>
>  * Delete the NLTK _user_ and recreate it as an _organization_
>   - https://github.com/account/organizations/new
>  * Create a repo called nltk under nltk, at least with a first commit
>   - That way people can use the github «Fork» button to create forks
>  * All forks submit code to this repo using the «Pull request» feature
>   - Admin members of the organization accept or reject the requests
>  * The «Issues» feature should be deactivated on github
>  * Initial checkin should be git-svn imported
>  * All further commits and merges should also be handled using git-svn
>
> I'd be more than happy to discuss or help with any part of this process.
> The easiest way to reach me would be this email or on IRC - nick xim on
> #nltk on Freenode.
>
> Kind regards,
> --
> Morten Minde Neergaard
>

> --
> You received this message because you are subscribed to the Google Groups "nltk-dev" group.
> To post to this group, send email to nltk...@googlegroups.com.
> To unsubscribe from this group, send email to nltk-dev+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nltk-dev?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages