FCFG Parsing Efficiency

130 views
Skip to first unread message

Max

unread,
Dec 28, 2010, 12:26:46 PM12/28/10
to nltk-users
Hello nltk-users,

I'm writing an FCFG grammar for a subset of English as part of a small
DRT-based semantic analysis project. NLTK has allowed me to quickly
get started and build a fairly large grammar. However, now that the
grammar is pushing ~450 ambiguous, non-lexical rules, the NLTK feature
chart parser seems to be struggling - short sentences can take
anywhere from 2 to 30 seconds to parse.

Lexical rules do not seem to affect parsing performance at all:
switching from 1K to 500K rules containing no nonterminals on the
right hand side hardly made any difference (although loading such a
grammar takes a whole minute).

The grammar in question uses features quite sparingly, only having a
single level and only using them for enforcing number/verb form
matching and such rather than building up semantic representation
(that is done in a separate procedural step afterwards).

Now I'm looking for ways to improve the parsing performance, either by
optimizing the grammar, improving the NLTK parser or switching to
another one. Any pointers towards any of these goals would be quite
helpful.

I've posted a similar question on StackOverflow (http://
stackoverflow.com/questions/4543008/efficient-context-free-grammar-
parser-preferably-python-friendly), but so far that hasn't produced
many useful results.

Thanks!

Max

Peter Ljunglöf

unread,
Dec 29, 2010, 5:08:33 PM12/29/10
to nltk-...@googlegroups.com
Hi Max,

28 dec 2010 kl. 18.26 skrev Max:

> I'm writing an FCFG grammar for a subset of English as part of a small
> DRT-based semantic analysis project. NLTK has allowed me to quickly
> get started and build a fairly large grammar. However, now that the
> grammar is pushing ~450 ambiguous, non-lexical rules, the NLTK feature
> chart parser seems to be struggling - short sentences can take
> anywhere from 2 to 30 seconds to parse.

I'm responsible for most of the parsing algorithms in NLTK. I have optimized the context-free implementations as much as I can, and they are quite fast now. But the feature grammar parser is 100x-1000x slower, and the bottle-neck is unification.

I have tried to optimize the unification implementation, and by stripping away almost everything I can get it twice as fast, but it's still about 100x too slow. If you want it faster, you have to do some tweaking yourself.

> Lexical rules do not seem to affect parsing performance at all:
> switching from 1K to 500K rules containing no nonterminals on the
> right hand side hardly made any difference (although loading such a
> grammar takes a whole minute).

As I'm suggesting below, try to use the lefthand sides as context-free nonterminals (i.e. strings), not feature structures. Also, try to save the resulting grammar as a Pickled object, it might speed up loading.

> The grammar in question uses features quite sparingly, only having a
> single level and only using them for enforcing number/verb form
> matching and such rather than building up semantic representation
> (that is done in a separate procedural step afterwards).
>
> Now I'm looking for ways to improve the parsing performance, either by
> optimizing the grammar, improving the NLTK parser or switching to
> another one. Any pointers towards any of these goals would be quite
> helpful.

I think you have four options:

1. Re-implement the unification algorithm in C (or Cython). I guess you need to be a good C hacker for this, as well as have good knowledge about Robinson's unification algorithm.

2. Inline unification in the chartparser. For this you need to know a lot about the recent research on unification-based parsing. And be a real Python hacker.

3. Use an existing parser written in another language, and write a wrapper. I don't know about free C parsers (yacc/bison cannot handle features and are not good for ambiguous grammars), but you could have a look at LKB (written in Lisp) or ALE (written in Prolog). Note that both these systems have a different grammar format than in NLTK, so you have to transform your own grammar.

If LKB and ALE are too big for your purposes, it's easy to implement an efficient unification parser in Prolog (e.g., SWI prolog) in 10-20 lines of code. Then you can use pyprolog or pyswip to interact between Python and SWI.

4. Translate your grammar to context-free form, by expanding all features. This might lead to an explosion of the number of rules, but since you write that you don't use features that much, I guess the problem won't be that big.

The basic algorithm is quite straight-forward: first you find out which values each feature can have, and then you simply loop through each production in the grammar and instantiate each unification variable with all its possible values, yielding a new production for every instantiation. The nonterminals in the new productions should be strings and not FeatureStructures (e.g., you can use the __repr__ of the instantiated nonterminal as the new context-free nonterminal).

If the resulting grammar explodes, there are some techniques you can try. See, e.g., M. Rayner, J. Dowding and B.A. Hockey (2001). "A Baseline Method for Compiling Typed Unification Grammars into Context Free Language Models". Available from here:

http://www.issco.unige.ch/en/research/projects/regulus/publications.shtml


So, whatever option you choose will involve some NLTK hacking. If you do some of these things, please sign up to nltk...@googlegroups.com and share your experience and your code.

> I've posted a similar question on StackOverflow (http://
> stackoverflow.com/questions/4543008/efficient-context-free-grammar-
> parser-preferably-python-friendly), but so far that hasn't produced
> many useful results.

I'm not surprised: there are mostly hackers who answer questions on stackoverflow, and most hackers know nothing about NLP problems. (Which can be seen from the answers you got)

best,
/Peter Ljunglöf

PS. Option nr 4 has been on my TODO list for a while now, but I haven't got the time to actually implement it yet... it would be great if you happen to do something about it:)

Max

unread,
Dec 29, 2010, 11:00:08 PM12/29/10
to nltk-...@googlegroups.com
Hi Peter,

Thanks for your detailed reply.

> I'm responsible for most of the parsing algorithms in NLTK. I have optimized the context-free implementations as much as I can, and they are quite fast now. But the feature grammar parser is 100x-1000x slower, and the bottle-neck is unification.

That is indeed the bottleneck I'm experiencing. For benchmarking, I tried blindly removing all features and that made the parsing about 40x faster (6s -> 150ms) but of course the result is nowhere near equivalent.

> Also, try to save the resulting grammar as a Pickled object, it might speed up loading.

I had tried doing that, but there was no difference in loading time - I believe that most of the overhead is simply the creation of a huge number of tiny object for each nonterminal, production, variable, etc.

> 1. Re-implement the unification algorithm in C (or Cython). I guess you need to be a good C hacker for this, as well as have good knowledge about Robinson's unification algorithm.

I'm hoping to avoid that until every other possibility is exhausted.

> 2. Inline unification in the chartparser. For this you need to know a lot about the recent research on unification-based parsing. And be a real Python hacker.

Given that I need only a very limited version of unification, this might be a viable way. Still, I expect that will require more work than the FCFG->CFG conversion if I manage to get the latter to work.

> If LKB and ALE are too big for your purposes, it's easy to implement an efficient unification parser in Prolog (e.g., SWI prolog) in 10-20 lines of code. Then you can use pyprolog or pyswip to interact between Python and SWI.

I have considered using Prolog behind the scenes, but hadn't realized there are libraries like pyswip that make interacting with it a breeze. If I fail at #4, I will give this a whirl. I'm not sure how much faster calling out on each unification might be, but it's worth a shot.

> 4. Translate your grammar to context-free form, by expanding all features. This might lead to an explosion of the number of rules, but since you write that you don't use features that much, I guess the problem won't be that big.

Alas, the explosion did happen. I wrote a naive single-level flattener and the first non-trivial rule I had exploded into 11K variations.

> If the resulting grammar explodes, there are some techniques you can try. See, e.g., M. Rayner, J. Dowding and B.A. Hockey (2001). "A Baseline Method for Compiling Typed Unification Grammars into Context Free Language Models". Available from here:

That paper and its references are very promising - one of the optimizations they recommend will fix the particular problem that resulted in the explosion mentioned above. I'll try implementing it and see if the result is usable.

I'll update this topic once I find a solution or eliminate any possiblities.

Regards,
Max
Reply all
Reply to author
Forward
0 new messages