Generating sentences from an fcfg?

938 views
Skip to first unread message

Philip Leclerc

unread,
Nov 24, 2014, 5:03:16 PM11/24/14
to nltk-...@googlegroups.com
Hi there!

I posted this question initially to StackOverflow, but this list seems maybe more appropriate. The StackOverflow link is here:


To briefly summarize, I'm trying to generate random sentences from a feature-based context-free grammar in nltk, but I can't seem to find any appropriate methods in the nltk modules, or an example, to help me do so. When I use the nltk.parse.generate.generate method that is used in examples for regular cfg's, the featural information is ignored, and so, for example, phrases like 'this boys' are produced, ignoring agreement (gender/number) information stored in the fcfg's features.

I have attempted to modify the nltk.parse.generate.generate method used in examples for producing random sentences from regular cfg's, but so far have not had much luck identifying the source of the problem.

Has this issue been noted by anyone else? Are there any obvious problems that might've caused it on my installation?

Thanks for your help!
-Philip Leclerc

Peter Ljunglöf

unread,
Nov 25, 2014, 3:13:04 AM11/25/14
to nltk-...@googlegroups.com
Hej Philip,

things become more difficult when you start using feature structure grammars instead of context-free. What you need to do is to use unification every time you select a new production. But unification sometimes fails, and then you have to cope with that in some way.

Also, the current generate() function does not generate a random sentence, but it generates all possible sentences up to a certain depth. In your StackOverflow question you select a sentence randomly from the 100 first generated sentences, which won't work well if you have a slightly larger grammar.

Anyway, if you want to modify the generation algorithm to generate from a feature grammar, then you have to introduce a filter in the function nltk.parse.generate._generate_one() which unifies the lhs of the new production with the given argument "item". BUT, you also need to keep track of the bindings of all previous unifications, and remember the new bindings for the unification of the children and siblings.

The simpler solution (which can take a long time) is to generate all possible sentences using the existing algorithm, and then use the FeatureChartParser as a filter, only keeping the sentences that can be parsed. Something like this:

>>> parser = nltk.FeatureChartParser(grammar)
>>> for sentence in generate(grammar):
... if parser.parse_one(sentence):
... print sentence

Good luck,
/Peter
> --
> You received this message because you are subscribed to the Google Groups "nltk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Kel Reiser

unread,
Dec 23, 2019, 10:10:02 AM12/23/19
to nltk-users
Hi there!

I basically have the problem that Philip already explained. My goal is to generate random sentences from a fcfg.

Generating all sentences and filtering afterwards would be horribly inefficient and take ages if the grammar is a little more complex, so that's not an option.
And rewriting the nltk.parse.generate._generate_one() method seems much too complex (if it was easy, nltk would already handle it without asking, right?)! I understood that I would have to introduce feature unification of any new production and the item given; but I couldn't get it to work, nor do I understand exactly what additional work would have to be done and how (keeping all the bindings and such) ...

I should mention that I'm doing this as a hobby, I have no background in NLP or working with nltk. I've been struggling with this problem for months now, even wrote specific people involved in the nltk implementation (sadly, they could not help). Since I somehow got convinced that the feature variables were the culprit, I wrote a script that translates a grammar with feature variables to a grammar without any, substituting each line containing variables with as many lines as there are terminal combinations for all the variables contained. I finally got it to work, just to see that nltk's generate() is still ignoring the features. :-/
I'm so happy to have found this thread, because I was thinking that I'm doing something wrong the whole time.

Is it really possible that no one has implemented some construct to get nltk to generate sentences from a fcfg? Is it really as difficult as I believe it is?

Hmmm ... Maybe nltk is not even the right path to go down: My ultimate goal is to implement a twitter bot generating funny sentences in German using a very specific set of terminals. I started out with http://tracery.io/editor, but wanted to get more sophisticated than that, especially because I thought I'd be better off using features when dealing with German grammar rules. Any advice as to what I should do? Is there a tool that fits my problem much better than nltk? Are feature-based grammars the wrong decision for me? Any help or advice is appreciated! :)

Thanks,
Kel

Am Dienstag, 25. November 2014 09:13:04 UTC+1 schrieb peter ljunglöf:
Hej Philip,

things become more difficult when you start using feature structure grammars instead of context-free. What you need to do is to use unification every time you select a new production. But unification sometimes fails, and then you have to cope with that in some way.

Also, the current generate() function does not generate a random sentence, but it generates all possible sentences up to a certain depth. In your StackOverflow question you select a sentence randomly from the 100 first generated sentences, which won't work well if you have a slightly larger grammar.

Anyway, if you want to modify the generation algorithm to generate from a feature grammar, then you have to introduce a filter in the function nltk.parse.generate._generate_one() which unifies the lhs of the new production with the given argument "item". BUT, you also need to keep track of the bindings of all previous unifications, and remember the new bindings for the unification of the children and siblings.

The simpler solution (which can take a long time) is to generate all possible sentences using the existing algorithm, and then use the FeatureChartParser as a filter, only keeping the sentences that can be parsed. Something like this:

>>> parser = nltk.FeatureChartParser(grammar)
>>> for sentence in generate(grammar):
...     if parser.parse_one(sentence):
...         print sentence

Good luck,
/Peter


> 24 nov 2014 kl. 23:03 skrev Philip Leclerc <philip...@gmail.com>:
>
> I posted this question initially to StackOverflow, but this list seems maybe more appropriate. The StackOverflow link is here:
>
> http://stackoverflow.com/questions/27109025/producing-random-sentences-from-an-nltk-fcfg
>
> To briefly summarize, I'm trying to generate random sentences from a feature-based context-free grammar in nltk, but I can't seem to find any appropriate methods in the nltk modules, or an example, to help me do so. When I use the nltk.parse.generate.generate method that is used in examples for regular cfg's, the featural information is ignored, and so, for example, phrases like 'this boys' are produced, ignoring agreement (gender/number) information stored in the fcfg's features.
>
> I have attempted to modify the nltk.parse.generate.generate method used in examples for producing random sentences from regular cfg's, but so far have not had much luck identifying the source of the problem.
>
> Has this issue been noted by anyone else? Are there any obvious problems that might've caused it on my installation?
>
> Thanks for your help!
> -Philip Leclerc
>
> --
> You received this message because you are subscribed to the Google Groups "nltk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nltk-...@googlegroups.com.

Peter Ljunglöf

unread,
Feb 4, 2020, 2:22:44 AM2/4/20
to nltk-...@googlegroups.com
Hi Kei,
I'm sorry for the delayed answer, but it seems like noone else has given your question a try.

It was a while I was using NLTK's grammars, but as I recall the FCFGs are not implemented with generation in mind, which means that it's inefficient and probably doesn't work correctly. FCFG is a unification-based formalism, and sine Python doesn't have unification built-in, NLTK has its own implementation. It's really cool: feature structures are subclasses of standard Python dictionaries, and unification even works for dict types, but it's slow (because it's written in Python), and the generator does not make use of unificationn (as I recall).

Your options are:

(1) Rewrite the generator to use unification as early as possible, not as a post-filter; this requires some coding skills, as well as reading NLTK source code.

(2) Use a programming language with built-in unification, such as Prolog which has both built-in unification and backtracking, and nice syntax for writing grammars. It doesn't have feature structures built-in, but instead terms which work perfectly fine for medium-sized grammars. I suggest to use XSB Prolog, which uses tabling/memoisation that stops it from falling into infinite left-recursion loops. (Which for grammar means that you can write left-recursive grammars without worrying)

http://xsb.sourceforge.net

(3) Use another formalism, such as Grammatical Framework (GF). It's a functional formalism, not unification-based: instead of unification, you apply functions to parameters. It's also multilingual and there is a quite large resource grammar (a kind of API) of 30+ langauges, including German. So as soon as you have learned how to thing GF, it should be fairly straight-forward to use the resource grammar to implement your own domain-specific grammar.

http://www.grammaticalframework.org

Good luck!
/Peter

PS. Now I see that it was me who you replied to... didn't notice until now:)
> To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/nltk-users/2b2ebbf0-2d66-4a9c-9463-5e1148240cbf%40googlegroups.com.

------- ------ ----- ---- --- -- - - - - -
peter ljunglöf
peter.l...@gu.se
data- och informationsteknik, och språkbanken
göteborgs universitet och chalmers tekniska högskola
-------------- --------- -------- ------- ------ ----- ---- --- -- - - - - -


Luke

unread,
Apr 23, 2020, 8:53:17 PM4/23/20
to nltk-users
Hey Kel-

I came across this post with a similar goal- to generate text with a fcfg. I am struggling to find any answers. Have you found any good resources? 

Thanks,

Luke

Wilson

unread,
Dec 3, 2022, 5:12:16 AM12/3/22
to nltk-users
Hello, everyone.

This is an old question. I wonder if any of you managed to solve this?

Kind regards,
Wilson

Reply all
Reply to author
Forward
0 new messages