I did some initial tests of NLTK on Pypy, and it's faster, a lot faster! At least on feature structure unification and parsing. Below is a test I did on the big Alvey grammar (nltk.data.load('grammars/large_grammars/alvey.fcfg')). It tries two different parsing strategies on five sentences. The "size" is the number of edges in the final chart, and the "time" is in seconds.
As you can see, Pypy is around 4x faster than CPython. I think it's quite impressive!
/Peter
PS. Yes, as you can see the feature unification very slow on either platform. But the grammar is quite complex too.
$ pypy parsetest.py
* <class 'nltk.parse.featurechart.FeatureChartParser'>
Sentence Time(s) Nr.edges Time/edge(ms)
"he doesn't help" 3.92 428 9.16
"don't help him" 4.40 622 7.08
"apologize to him" 3.84 690 5.57
"she helps busily" 1.24 213 5.83
"she busily helps" 0.98 176 5.55
- TOTAL 14.39 2129 6.76
* <class 'nltk.parse.earleychart.FeatureIncrementalChartParser'>
Sentence Time(s) Nr.edges Time/edge(ms)
"he doesn't help" 2.90 428 6.79
"don't help him" 3.74 622 6.01
"apologize to him" 2.77 690 4.02
"she helps busily" 0.91 213 4.26
"she busily helps" 0.86 176 4.87
- TOTAL 11.17 2129 5.25
$ python parsetest.py
* <class 'nltk.parse.featurechart.FeatureChartParser'>
Sentence Time(s) Nr.edges Time/edge(ms)
"he doesn't help" 13.45 428 31.44
"don't help him" 17.32 622 27.84
"apologize to him" 13.13 690 19.02
"she helps busily" 4.63 213 21.73
"she busily helps" 4.33 176 24.62
- TOTAL 52.86 2129 24.83
* <class 'nltk.parse.earleychart.FeatureIncrementalChartParser'>
Sentence Time(s) Nr.edges Time/edge(ms)
"he doesn't help" 13.26 428 30.97
"don't help him" 17.12 622 27.53
"apologize to him" 12.24 690 17.74
"she helps busily" 4.54 213 21.32
"she busily helps" 4.24 176 24.08
- TOTAL 51.40 2129 24.14
_______________________________________________________________________
peter ljunglöf
department of computer science and engineering
university of gothenburg and chalmers university of technology
Maybe you should try to parse more sentences to warm the JIT,
probably, pypy will be even faster.
--
Dima
> --
> You received this message because you are subscribed to the Google Groups "nltk-dev" group.
> To post to this group, send email to nltk...@googlegroups.com.
> To unsubscribe from this group, send email to nltk-dev+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nltk-dev?hl=en.
>
>
I already did that - I parsed one sentence (not the same as anyone of the others) before measuring, since that sentence took almost twice as long to parse. But then the gain apparently plans out.
/Peter