The first prototype for a large scale GF parser

40 views
Skip to first unread message

Krasimir Angelov

unread,
May 17, 2012, 7:08:23 AM5/17/12
to Grammatical Framework
Dear GF friends,

I assembled a simple web service which makes it easier for you to test
the first prototype of the robust statistical parser. This is a
combination of the English resource grammar and a statistical model
which I trained on Penn Treebank. The work is still in very early
stage but there is at least something working. The web interface is
here:

http://www.grammaticalframework.org/demos/robust-parser/

Few of the known limitations are:

- the tokenizer and the named entity recognizer are very primitive.
For better results you should start the sentence with lowercase letter
unless if the word is a proper name or one of the English adjectives
that are written with capital letter (English, Swedish, South, etc).
Do not terminate the sentence with a dot.

- The statistical model is not lexicalized. This means that you
should not hope to get the right PP attachement. Currently the
attachement is always linked to the verb since this is more common in
the treebank.

- The parser is still slow and memory hungry for the sentences that
are not in the scope of the grammar. Be prepared for slow response and
sometimes even failure.

- It is still a very new thing and it might have a lot of bugs.

Best Regards,
Krasimir

Aarne Ranta

unread,
May 17, 2012, 7:23:33 AM5/17/12
to gf-...@googlegroups.com
Hello Krasimir,

Great demo and interface!

My first example was "what is this" The result had many metavariables, although this phrase is recognized by the RGL. It is as if you used S as start category, although it seems to be Utt, which does cover questions. 

Some declarative sentences worked fine, but some failed with no feedback e.g. saying whether the system was busy munching the input or just not responding.

  Aarne.

Krasimir Angelov

unread,
May 17, 2012, 7:26:12 AM5/17/12
to gf-...@googlegroups.com
2012/5/17 Aarne Ranta <aa...@chalmers.se>:
> My first example was "what is this" The result had many metavariables,
> although this phrase is recognized by the RGL. It is as if you used S as
> start category, although it seems to be Utt, which does cover questions.

The module for Questions from the resource grammar is not imported
since the treebank doesn't have any questions. This could be changed
of course.

> Some declarative sentences worked fine, but some failed with no feedback
> e.g. saying whether the system was busy munching the input or just not
> responding.

Yes. This could happen.

Krasimir Angelov

unread,
May 18, 2012, 3:45:52 AM5/18/12
to gf-...@googlegroups.com
I switched off the robustness feature because it was consuming far too
much memory. Now you can parse only sentences which are in the scope
of the grammar and the output will be the best tree according to the
statistical model. Currently you don't see any error if the sentence
is not parseable you just get an icon for missing picture from the
browser. I will enable the feature again when I find a way to control
the memory size.

Regards,
Krasimir


2012/5/17 Krasimir Angelov <kr.an...@gmail.com>:

Shafqat Virk

unread,
May 18, 2012, 9:51:18 AM5/18/12
to gf-...@googlegroups.com
Hi Krasimir,
Great. I was wondering if we can also linearize the trees to other languages in the demo. Since Hindi RG is now available with DictHin parallel to DictEng. Even though there will be errors in the translations, and also the word-sense issues. But, at leaset we can see the translations. 
BR 

Krasimir Angelov

unread,
May 18, 2012, 3:34:00 PM5/18/12
to gf-...@googlegroups.com
2012/5/18 Shafqat Virk <virk.s...@gmail.com>:
> Hi Krasimir,
> Great. I was wondering if we can also linearize the trees to other languages
> in the demo. Since Hindi RG is now available with DictHin parallel to
> DictEng. Even though there will be errors in the translations, and also the
> word-sense issues. But, at leaset we can see the translations.
> BR

We can try it but I am afraid that this might be too heavy for the
server. I remeber that the compiled English<->Hindi grammar is about
16Mb while the English one is only 6Mb. I know that you can load the
grammar on your computer but there are a lot of other processes on the
server and we have to check that this doesn't slow down everything.
The fact that you can compile and load this grammar at all is quite
remarkable by itself since this was not even thinkable few years ago.

Regards,
Krasimir

Erel Segal Halevi

unread,
May 31, 2012, 9:18:14 AM5/31/12
to gf-...@googlegroups.com
Excellent, thank you! I think it should be linked from the demo page.
Reply all
Reply to author
Forward
0 new messages