Concrete not parsing its own linearization

36 views
Skip to first unread message

Laurette Marais

unread,
Mar 4, 2020, 5:18:06 AM3/4/20
to Grammatical Framework
Hello,

I've gotten stuck on an error that I am struggling to recreate. Basically, I have compiled a PGF called Symptoms.pgf using --optimize-pgf (not sure if that's relevant to my problem) and now, when using the Python bindings to the C runtime, the following happens:

In [1]: import pgf

In [2]: gr = pgf.readPGF("Symptoms.pgf")

In [3]: zul = gr.languages["SymptomsZul"]

In [4]: afr = gr.languages["SymptomsAfr"]

In [5]: expr = pgf.readExpr("Q7_PointToPainR")

In [6]: afr.linearize(expr)
Out[6]: 'Wys asseblief na die pyn.'

In [7]: zul.linearize(expr)
Out[7]: 'Ngicela ukhombe lapho kunobuhlungu khona.'

In [8]: afr.parse(afr.linearize(expr))
Out[8]: <pgf.Iter at 0x7fcc902decb0>

In [9]: zul.parse(zul.linearize(expr))
---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-9-64da6cae2013> in <module>()
----> 1 zul.parse(zul.linearize(expr))

ParseError: The sentence is incomplete

In [10]: zul.parse("btw, this error message seems to be misleading...")
---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-15-2e4d4dc73eee> in <module>()
----> 1 zul.parse("btw, this error message seems to be misleading...")

ParseError: The sentence is incomplete

I have been trying to recreate this kind of error (where a concrete cannot parse its own linearization) with a demo grammar that I could easily share here, but I have not succeeded, mainly because I have no idea what aspects to try and isolate. I have tried looking at the use of the case_sensitive flag, the use of BIND in the Zulu grammar, underscores in some of the linearizations, but they don't seem to be causing the error - at least as far as I am trying to recreate it.

Any help on how I might debug this further would be appreciated!

Regards,
Laurette

Krasimir Angelov

unread,
Mar 5, 2020, 1:14:35 AM3/5/20
to Grammatical Framework
Hi Laurette,

Can you share the original Symptoms.pgf grammar that caused the problem? I can't help if I cannot test it.

Best Regards,
Krasimir


--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gf-dev/21936639-32ba-4a78-a9d3-d221b74a0a3c%40googlegroups.com.

Laurette Marais

unread,
Mar 5, 2020, 1:51:50 AM3/5/20
to Grammatical Framework
Hi Krasimir,

It's quite big, so I've shared the PGF with you via Google Drive. Would you need to look at the code as well?

Regards,
Laurette


Krasimir Angelov

unread,
Mar 5, 2020, 4:28:42 AM3/5/20
to Grammatical Framework
Hi Laurette,

Note that you can successfully parse "ngicela ukhombe lapho kunobuhlungu khona.", i.e. with lower case letter. Rule Q7_PointToPainR actually compiles to the sequence:

CAPIT"ngi"BIND"cel"BIND"a""u"BIND"khomb"BIND"e""lapho""ku"BIND"no"BIND"buhlungu""khona"SOFT_BIND"."

CAPIT causes the linearizer to turn "n" to upper case. The parser, however, just ignores the CAPIT token. This token was supposed to work in combination with the flag case_sensitive=off, which would let the parser to accept both "ngicela" and "Ngicela" but that was not fully implemented until recently. It is implemented now but I cannot test it without recompiling your grammar. I have however tested it with the WordNet grammar.

You can do two things. If all the tokens that you have in the grammar are in lower case, then you can just lower case the input as well. If that is not the case then you should recompile the grammar with the development version of the compiler and use also the latest runtime. At least for Zulu I can only see lower-case letters, so you can just take the first option if you don't want to reinstall and recompile everything.

Best Regards,
  Krasimir


On Wed, 4 Mar 2020 at 11:18, Laurette Marais <laure...@gmail.com> wrote:

Laurette Marais

unread,
Mar 5, 2020, 5:05:00 AM3/5/20
to Grammatical Framework
Hi Krasimir,

Thank you very much! I will try to reinstall and recompile, because some other grammars I'm using do have upper-case letters in Zulu.

I have found another example that gives a similar problem, but it doesn't seem to originate from CAPIT. The grammar has a "SymptomsGui" concrete, which just produces a string representation of the tree with all parentheses removed (which is used to simplify "construction" of trees by the client application).

Now, this concrete doesn't use CAPIT, but the same thing happens. As expected, using lowercase here doesn't work, because CAPIT isn't involved.

In [1]: import pgf

In [2]: g = pgf.readPGF("Symptoms.pgf")

In [3]: gui = g.languages["SymptomsGui"]

In [4]: expr = pgf.readExpr("Q7_PointToPainR")

In [5]: gui.linearize(expr)
Out[5]: 'Q7_PointToPainR'

In [6]: gui.parse(gui.linearize(expr))

---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-6-9974cc576e53> in <module>()
----> 1 gui.parse(gui.linearize(expr))


ParseError: The sentence is incomplete

I will be reinstalling and recompiling anyway, but if possible, if you could point me toward a solution for the above mentioned error before I do that, I would be very grateful!

Regards,
Laurette


Laurette Marais

unread,
Mar 10, 2020, 4:51:30 AM3/10/20
to Grammatical Framework
Hi Krasimir,

I have now reinstalled and recompiled from the latest source. It seems like the case_sensitive flag works for the Zulu (as far as I have been able to test). But I have now actually managed to reduce my problem with the "Gui" language to a demo. I have attached some GF files.

I compiled the PGF using the following command:

$ gf --make --optimize-pgf SymptomsGui.gf

Then, when using the Python runtime bindings:

In [1]: import pgf

In [2]: g = pgf.readPGF("Symptoms.pgf")

In [3]: gui = g.languages["SymptomsGui"]

In [4]: e = pgf.readExpr("Q1_SymptomQuestion")

In [5]: gui.linearize(e)
Out[5]: 'Q1_SymptomQuestion'

In [6]: gui.parse(gui.linearize(e))
[0-0; CStart -> F5(linref Utt)[C2]; 0 :  . <0,0>; 0.000000+0.000000=0.000000]
[0-0; C2 -> F10(Q2_SymptomQuestion)[C0]; 0 :  . "Q2_SymptomQuestion"<0,0>; 1.098612+0.000000=1.098612]
[0-0; C2 -> F11(Q3_SymptomQuestion)[C0,C1]; 0 :  . "Q3_SymptomQuestion"<0,0><1,0>; 1.098612+0.000000=1.098612]

---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-6-7eea4ef008f1> in <module>()
----> 1 gui.parse(gui.linearize(e))


ParseError: The sentence is incomplete

Having played around with the grammar, it seems like the underscores in the concrete grammar may be playing a role, but I am not sure. Also, if I compile without the --optimize-pgf flag, the problem seems to disappear (I've attached both compiled PGFs). I really do need the optimization, though, because once I add Zulu and Xhosa it makes a massive difference to the size of the PGF.

Best regards,
Laurette



Symptoms.gf
Time.gf
TimeGui.gf
SymptomsGui.gf
Symptoms.pgf
SymptomsNonOpt.pgf

Krasimir Angelov

unread,
Mar 13, 2020, 5:19:51 AM3/13/20
to Grammatical Framework
Hi Laurette,

I know what causes the problem. I made some optimizations in the runtime but then I didn't update that optimization. 

I was planning to fix the problem today but I also have to replan the exam for my course due to corona. I am hoping to manage with both.



Krasimir Angelov

unread,
Mar 15, 2020, 3:00:02 PM3/15/20
to Grammatical Framework
Hi Laurette,

The problem is now fixed. The C runtime is unchanged, you only need to rebuild the compiler.

Best Regards,
Krasimir

Laurette Marais

unread,
Mar 16, 2020, 9:09:25 AM3/16/20
to Grammatical Framework
Hi Krasimir,

Many thanks for fixing it so quickly!

Best regards,
Laurette

Laurette Marais

unread,
Aug 28, 2020, 7:33:53 AM8/28/20
to Grammatical Framework
Hi Krasimir,

I have run into this same issue again, but this time I can share the code of the grammar with you.

My mother and I are working on a Zulu RG, and I'm trying to see how much of a specific corpus can be parsed with it in its current state. Some places where BIND is used parsed sentences without issues, but some didn't. As far as I know I have the latest version of gf-core installed. The code for the RGL is at https://github.com/LauretteM/gf-rgl-zul (still very much WIP).

Below is what happened in the Python interpreter. Am I doing something wrong that you can see?

Kind regards,
Laurette

Python 3.8.2 (default, Jul 16 2020, 14:00:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pgf                                                                                                                                  

In [2]: g = pgf.readPGF("NguniLangAbs.pgf")                                                                                                          

In [3]: zul = g.languages["NguniLangZul"]                                                                                                            

In [4]: expr = pgf.readExpr("PhrUtt NoPConj (UttS (UseCl TPresTemp PPos (PredVP (UsePron i_Pron) (UseComp (CompNP (MassNP (UseN ntu_1_2_N))))))) NoVoc")                                                                                                                          

In [5]: zul.linearize(expr)                                                                                                                          
Out[5]: 'ngingumuntu'

In [6]: zul.parse(zul.linearize(expr))                                                                                                              
Out[6]: <pgf.Iter at 0x7f46c95ebd70>

In [7]: expr = pgf.readExpr("PhrUtt NoPConj (UttS (UseCl TPresTemp PNeg (PredNP (MassNP (UseN ntu_1_2_N))))) NoVoc")                                

In [8]: zul.linearize(expr)                                                                                                                          
Out[8]: 'ngumuntu'


In [9]: zul.parse(zul.linearize(expr))                                                                                                              
---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-9-64da6cae2013> in <module>
----> 1 zul.parse(zul.linearize(expr))

ParseError: The sentence is incomplete
Reply all
Reply to author
Forward
0 new messages