Hints for debugging new RGL?

65 views
Skip to first unread message

Roman Suzi

unread,
Jul 20, 2020, 3:20:54 AM7/20/20
to Grammatical Framework
Hi!

I've implemented most of next gen Russian RGL ( https://github.com/rnd0101/gf-rgl-russian2 ). Well, verbs morphology is not yet there, just a few verbs, plus relative clauses are known problems (I am trying to figure out whether to switch RP to tables as records are clumsy in that particular case), plus some other problems I am aware about. Plus missing ordinals in numerals (same as in old Russian RG).

However, the last parts I did were more mechanical, guessed from types, so require some debugging with respect to word orderings especially. Old Rus RG is not always useful as it lacks certain constructs and sometime plain wrong.

The question is: Are there any articles or blogs or whatever on how to massively debug "theoretically" written grammar? The difficulty is in Russian many sentences make sense, but I found that sometimes it's only a coincidence. My guess is I need some reliable (and maybe hand-checked?) treebanks. Any ideas? The main point is syntax. Lexicon is pretty straightforward to debug.

In some cases I failed to understand what meaning RG is supposed to convey. The first such example is few_Det. I checked English and Finnish, but still not quite sure. Or take CleftAdv. (here Finnish helped:  This gives the almost forbidden "se on Porissa kun Matti asuu". But does not increase confidence still as it's in the grey area of my language intuition. As a dev suggestion, there could be some "semantically universal" description of each construct  way so developers can check results in their language. Maybe, some /semantic pseudo-language can explain things in simple English or something.

Also, I've not understood clear division between Morpho and Res modules. In what I did ResRus.gf is bloated, but MorphoRus is very small, plus I have a separate module for special kind of Morpho. Any advice here?

One more question is whether it's possible to have embedInCommas for the beginning of the sentence as well? At the moment it does not "sense" being the first thing, only punctuation. Maybe, it's trivial to add to RGL or new term for "beginning" is needed?

Then also a question what are acceptance criteria for inclusion in official RGL? Now that there already is Russian, what is the way upgrade? Will it be "russian2" or replace old "russian"? I am aiming at total replacement, but it will not be backwards compatible because I streamlined things and some obscure cases should be served differently (in what I see as much less hackish way). I see something like exper under Fre/Romance, but in Rus case they are very different. I see no point in adapting one to another.

WIth best regards,
Roman

Inari Listenmaa

unread,
Jul 20, 2020, 9:44:45 AM7/20/20
to gf-...@googlegroups.com
Hi,

Great to hear that the work with the new Russian is going well!

The question is: Are there any articles or blogs or whatever on how to massively debug "theoretically" written grammar? The difficulty is in Russian many sentences make sense, but I found that sometimes it's only a coincidence. My guess is I need some reliable (and maybe hand-checked?) treebanks. Any ideas? The main point is syntax. Lexicon is pretty straightforward to debug.

Have you tried the trees from this directory? They're not many, but it's a start. https://github.com/GrammaticalFramework/gf-rgl/tree/master/treebanks

For more theoretical approach, you can read/skim through Chapter 5 in my thesis https://gupea.ub.gu.se/bitstream/2077/59037/1/gupea_2077_59037_1.pdf I know you're already familiar with gftest, and that semantic coherence is not its strong suite, but the thesis talks about testing in a more general level, and gives examples in a few different languages.

Finally, usually a good way to test a resource grammar is to use it in an application grammar. You could compare the Russian Phrasebook https://github.com/GrammaticalFramework/gf-contrib/tree/master/phrasebook with the new and the old Russian resource grammar. For Phrasebook, you can even generate a full set of trees with gftest; for such a small application grammar it should only be some hundreds of trees. They will still not make sense always, but now we're talking more about "is your grandfather pregnant" style nonsense than "therefore how much please" style of nonsense.

In some cases I failed to understand what meaning RG is supposed to convey. The first such example is few_Det. I checked English and Finnish, but still not quite sure. Or take CleftAdv. (here Finnish helped:  This gives the almost forbidden "se on Porissa kun Matti asuu". But does not increase confidence still as it's in the grey area of my language intuition. As a dev suggestion, there could be some "semantically universal" description of each construct  way so developers can check results in their language. Maybe, some /semantic pseudo-language can explain things in simple English or something.

RGL is a library of syntactic constructions, some of which make sense for languages A, B and C, and others which make sense for languages X, Y and Z. Especially the Extra (deprecated but kept for backwards compatibility) and Extend modules, they just contain stuff that someone needed for some application grammar, and instead of writing it as an oper in that particular application grammar, they added it in Extend. So you can't really ask for semantic descriptions from a resource that wasn't meant to have much semantics to start with. :-P The purely syntactic descriptions could be better and more multilingual, especially in the more obscure cases like PPartNP. 

If something is totally nonsense in Russian, feel free to not implement it, or implement it badly. That's the case for most of the RGs. (I would say for all, the authors who claim otherwise just haven't tested thoroughly enough to discover all the nonsense.)

Also, I've not understood clear division between Morpho and Res modules. In what I did ResRus.gf is bloated, but MorphoRus is very small, plus I have a separate module for special kind of Morpho. Any advice here?

The resource modules can be however grammarians like to split the division of labour. I think the convention has been born out of copying and pasting, with each person adding a bit of own innovation. So no advice really, just do what feels reasonable. (You can innovate even more and give new names to the resource modules, apart from Res and Morpho and MakeStructural. :-D)

One more question is whether it's possible to have embedInCommas for the beginning of the sentence as well? At the moment it does not "sense" being the first thing, only punctuation. Maybe, it's trivial to add to RGL or new term for "beginning" is needed?

Do you have an example of where you would need that? The pre construct doesn't handle that, so a better solution would be to add a parameter in the grammar, or add the comma from some function that uses the category where you now want to have the comma, like the ExtAdv* functions. 

Then also a question what are acceptance criteria for inclusion in official RGL? Now that there already is Russian, what is the way upgrade? Will it be "russian2" or replace old "russian"? I am aiming at total replacement, but it will not be backwards compatible because I streamlined things and some obscure cases should be served differently (in what I see as much less hackish way). I see something like exper under Fre/Romance, but in Rus case they are very different. I see no point in adapting one to another.

I think there's no question here, we aren't technically committed to maintain the internals of the RGL modules as is, only the API. The applications that depend on the internals will need to update their code, or use an older version of the RGL (and thus not benefit from the bugfixes). 

Cheers,
Inari

Roman Suzi

unread,
Jul 25, 2020, 10:31:06 AM7/25/20
to Grammatical Framework
HI!
Thanks for the hints.

Apart from some bugs I thing the new grammar has the following problems:

1. Seems like applications use ResRus internal functions a lot, so I added more functions, however, regV can't be reliably redone because old grammar used suboptimal set of 5 forms (new grammar uses max three for verbs - inf, SgP1, SgP3, and it's enough for most of verbs). Old grammar does not have SgP3, so there is no good way to emulate old regV... New grammar has built-in conjugation checker: I found 2-3 conjugation mistakes in my project with it.

2. phrasebook is all disaster, and I am trying to figure out why. For some reason numerals generate empty. Maybe, I have not connected numeral construction functions somewhere, strange.

3. I have hard time to understand why something as simple as "I eat an apple" is parsed into a monster like:

UseCl (TTAnt TPast ASimul) PPos (PredVP (AdvNP (UsePron i_Pron) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                        (AdvVP (ComplSlash (AdvVPSlash (VPSlashPrep (ComplSlash (SlashV2a eat_V2) (DetNP (DetQuant DefArt NumSg))) part_Prep)
                                                                       (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                                           (AdvNP (DetNP (DetQuant DefArt NumSg)) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg)))))
                                               (PrepNP part_Prep (AdvNP (MassNP (ApposCN (UseN apple_N) (AdvNP (DetNP (DetQuant IndefArt NumPl))
                                                                                                               (PrepNP possess_Prep (DetNP (DetQuant DefArt NumPl))))))
                                                                        (PrepNP possess_Prep (DetNP (DetQuant DefArt NumSg)))))))


It seems to me that making adverbs from everything gives too much freedom and muddies everything.

(When I lin that with Finnish I am getting some poetry:

Lang> l UseCl (TTAnt TPast ASimul) PPos (PredVP (AdvNP (UsePron i_Pron) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))  (AdvVP (ComplSlash (AdvVPSlash (VPSlashPrep (ComplSlash (SlashV2a eat_V2) (DetNP (DetQuant DefArt NumSg))) part_Prep)  (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))  (AdvNP (DetNP (DetQuant DefArt NumSg)) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg)))))  (PrepNP part_Prep (AdvNP (MassNP (ApposCN (UseN apple_N) (AdvNP (DetNP (DetQuant IndefArt NumPl))  (PrepNP possess_Prep (DetNP (DetQuant DefArt NumPl))))))  (PrepNP possess_Prep (DetNP (DetQuant DefArt NumSg)))))))
minä sitä söin sitä sitä sitä sitä omenaa yhdet niiden sen

). So my guess is I have an effectively "empty" construct in the grammar...

But then when I get hint from Finnish I can find the short way (probably even shortest):

AllRusAbs> l UseCl (TTAnt TPast ASimul) PPos (PredVP (UsePron i_Pron) (ComplSlash (SlashV2a eat_V2) (MassNP (UseN apple_N))))
я кушал яблоко

I am wondering how to deal with parsing. Maybe, I should remove some functions from parsing.

My idea is to make a test set of simple phrases an see they round trip to abstract and back just fine.

-Roman

Inari Listenmaa

unread,
Jul 25, 2020, 3:20:47 PM7/25/20
to Grammatical Framework
Hi,

1. Seems like applications use ResRus internal functions a lot, so I added more functions, however, regV can't be reliably redone because old grammar used suboptimal set of 5 forms (new grammar uses max three for verbs - inf, SgP1, SgP3, and it's enough for most of verbs). Old grammar does not have SgP3, so there is no good way to emulate old regV... New grammar has built-in conjugation checker: I found 2-3 conjugation mistakes in my project with it.

That's unfortunate. Which applications are you talking about? 

If you want to be kind to the old applications, you could leave the internal opers with the same name and type signature, but just not use the arguments. If the old oper is called regV and it takes 5 forms a,b,c,d,e, and your new version takes some arguments a,b,f, then you could do something along the lines

goodRegV : (a,b,f : Str) -> Verb = … -- your new implementation

regV : (a,b,c,d,e) -> Verb = \a,b,c,d,e ->
  let f = guessF a b … ;
  in goodRegV a b f ;


3. I have hard time to understand why something as simple as "I eat an apple" is parsed into a monster like:

UseCl (TTAnt TPast ASimul) PPos (PredVP (AdvNP (UsePron i_Pron) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                        (AdvVP (ComplSlash (AdvVPSlash (VPSlashPrep (ComplSlash (SlashV2a eat_V2) (DetNP (DetQuant DefArt NumSg))) part_Prep)
                                                                       (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                                           (AdvNP (DetNP (DetQuant DefArt NumSg)) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg)))))
                                               (PrepNP part_Prep (AdvNP (MassNP (ApposCN (UseN apple_N) (AdvNP (DetNP (DetQuant IndefArt NumPl))
                                                                                                               (PrepNP possess_Prep (DetNP (DetQuant DefArt NumPl))))))
                                                                        (PrepNP possess_Prep (DetNP (DetQuant DefArt NumSg)))))))

It seems to me that making adverbs from everything gives too much freedom and muddies everything. 

Allowing empty NPs is the root of the problem here, rather than adverbs. Since Russian doesn't have articles, it's understandable that DetNP for DefArt and IndefArt is just empty string.

DetNP makes sense for determiners like "this", "that" or "mine", "yours". It makes no sense for most other dets. Some RGs try to follow the principle "let's try to make most of the RGL trees make sense", and linearise DetNP some_Det into "something",  DetNP (DetQuant IndefArt NumSg) into "one", DetNP (DetQuant DefArt NumSg) into "this", and so on. Other languages try follow the principle "as long as there is some way in the RGL to say what I want, I don't care if some other trees linearise into nonsense".

Now, due to the ambiguity problems, I would recommend to include some nonempty string in all Dets. It should be separate from the s field, because you don't want "an apple" to be translated into "одно яблоко", but just "яблоко". But when (DetQuant IndefArt NumSg) is given to DetNP, then the non-empty string in the other field should be chosen. Same treatment for the definite article. Downside for picking words like "one" and "this" is that (DetNP (DetQuant this_Quant NumSg) would become ambiguous. If you can find a word that makes sort of sense and doesn't make it ambiguous, that'd be an improvement. But just introducing any string, even if it is "this" and "one", is better than getting such monstrous parses that you are getting.

But then when I get hint from Finnish I can find the short way (probably even shortest):

AllRusAbs> l UseCl (TTAnt TPast ASimul) PPos (PredVP (UsePron i_Pron) (ComplSlash (SlashV2a eat_V2) (MassNP (UseN apple_N))))
я кушал яблоко

I am wondering how to deal with parsing. Maybe, I should remove some functions from parsing.

The most important thing is that determiners have some non-empty string that DetNP can choose. That should get rid of most of the problems.

But some other constructions are also very overgenerating, like ApposCN. For internal purposes only, I sometimes insert some dummy string when I don't want ApposCN to pollute my parses. Like this:

ApposCN cn np = cn ** {s=\\n,cas => cn.s ! n ! cas ++ "_" ++ np.s ! cas} ;

A concrete use case: I'm writing an application grammar, and I want to find some construction that I know is in ExtendEng, but I don't remember what it is. I parse a sentence in AllEng, then I get tons of things that are garbage. I can't comment out the overgenerating functions, because they are used by the API, so instead add some string to their linearisations so that they won't show up in my parses. If I do want a tree with the function I normally don't want, I can just insert that character, like "city _ Paris".

Inari


Roman Suzi

unread,
Jul 26, 2020, 2:50:36 AM7/26/20
to Grammatical Framework
Hi!

On Saturday, July 25, 2020 at 10:20:47 PM UTC+3, Inari wrote:
Hi,

1. Seems like applications use ResRus internal functions a lot, so I added more functions, however, regV can't be reliably redone because old grammar used suboptimal set of 5 forms (new grammar uses max three for verbs - inf, SgP1, SgP3, and it's enough for most of verbs). Old grammar does not have SgP3, so there is no good way to emulate old regV... New grammar has built-in conjugation checker: I found 2-3 conjugation mistakes in my project with it.

That's unfortunate. Which applications are you talking about? 

True. There are not so many Russian RG applications - mgl, phrasebook in gf-contrib at least. So probably does not make sense to think too much about backwards compatibility.
 

If you want to be kind to the old applications, you could leave the internal opers with the same name and type signature, but just not use the arguments. If the old oper is called regV and it takes 5 forms a,b,c,d,e, and your new version takes some arguments a,b,f, then you could do something along the lines

goodRegV : (a,b,f : Str) -> Verb = … -- your new implementation

regV : (a,b,c,d,e) -> Verb = \a,b,c,d,e ->
  let f = guessF a b … ;
  in goodRegV a b f ;


The problem is there is no SgP3 among those arguments :-( So either old guessing code needs to be used to guess SgP3 (and I am not sure it even does it correctly) or I can just leave it behind.


3. I have hard time to understand why something as simple as "I eat an apple" is parsed into a monster like:

UseCl (TTAnt TPast ASimul) PPos (PredVP (AdvNP (UsePron i_Pron) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                        (AdvVP (ComplSlash (AdvVPSlash (VPSlashPrep (ComplSlash (SlashV2a eat_V2) (DetNP (DetQuant DefArt NumSg))) part_Prep)
                                                                       (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg))))
                                                           (AdvNP (DetNP (DetQuant DefArt NumSg)) (PrepNP part_Prep (DetNP (DetQuant DefArt NumSg)))))
                                               (PrepNP part_Prep (AdvNP (MassNP (ApposCN (UseN apple_N) (AdvNP (DetNP (DetQuant IndefArt NumPl))
                                                                                                               (PrepNP possess_Prep (DetNP (DetQuant DefArt NumPl))))))
                                                                        (PrepNP possess_Prep (DetNP (DetQuant DefArt NumSg)))))))

It seems to me that making adverbs from everything gives too much freedom and muddies everything. 

Allowing empty NPs is the root of the problem here, rather than adverbs. Since Russian doesn't have articles, it's understandable that DetNP for DefArt and IndefArt is just empty string.

DetNP makes sense for determiners like "this", "that" or "mine", "yours". It makes no sense for most other dets. Some RGs try to follow the principle "let's try to make most of the RGL trees make sense", and linearise DetNP some_Det into "something",  DetNP (DetQuant IndefArt NumSg) into "one", DetNP (DetQuant DefArt NumSg) into "this", and so on. Other languages try follow the principle "as long as there is some way in the RGL to say what I want, I don't care if some other trees linearise into nonsense".

Now, due to the ambiguity problems, I would recommend to include some nonempty string in all Dets. It should be separate from the s field, because you don't want "an apple" to be translated into "одно яблоко", but just "яблоко". But when (DetQuant IndefArt NumSg) is given to DetNP, then the non-empty string in the other field should be chosen. Same treatment for the definite article. Downside for picking words like "one" and "this" is that (DetNP (DetQuant this_Quant NumSg) would become ambiguous. If you can find a word that makes sort of sense and doesn't make it ambiguous, that'd be an improvement. But just introducing any string, even if it is "this" and "one", is better than getting such monstrous parses that you are getting.

I should have taken a look into more languages. Arabic had nifty solution of quant/det "type", so article can be translated in DetNP (and DetCN) instead of bringing the whole inflection table thru quants and dets.
 

But then when I get hint from Finnish I can find the short way (probably even shortest):

AllRusAbs> l UseCl (TTAnt TPast ASimul) PPos (PredVP (UsePron i_Pron) (ComplSlash (SlashV2a eat_V2) (MassNP (UseN apple_N))))
я кушал яблоко

I am wondering how to deal with parsing. Maybe, I should remove some functions from parsing.

The most important thing is that determiners have some non-empty string that DetNP can choose. That should get rid of most of the problems.

But some other constructions are also very overgenerating, like ApposCN. For internal purposes only, I sometimes insert some dummy string when I don't want ApposCN to pollute my parses. Like this:

ApposCN cn np = cn ** {s=\\n,cas => cn.s ! n ! cas ++ "_" ++ np.s ! cas} ;

A concrete use case: I'm writing an application grammar, and I want to find some construction that I know is in ExtendEng, but I don't remember what it is. I parse a sentence in AllEng, then I get tons of things that are garbage. I can't comment out the overgenerating functions, because they are used by the API, so instead add some string to their linearisations so that they won't show up in my parses. If I do want a tree with the function I normally don't want, I can just insert that character, like "city _ Paris".

Good hint! Thanks!

-Roman
 

Inari


Roman Suzi

unread,
Jul 31, 2020, 4:37:41 PM7/31/20
to Grammatical Framework
Hi!

One more question. So these Russian paradigms, advertised here:


Should new grammar support the old way? For example, I would like to deprecate Conjugation type and all those funny first, firstE, mixed, etc., and also regV (on the grounds that there are not regular verbs in Russian or at least there are many understandings of which verbs are regular). Also, handling Voice is not in paradigms any more. Of course, some specific helper functions can still be added to that list (mostly mkV, mkN, and such).


-Roman

Inari Listenmaa

unread,
Aug 1, 2020, 9:25:28 AM8/1/20
to gf-...@googlegroups.com
Yes, that's a good question! First a bit of general background on Paradigms modules.

Look at a file like ParadigmsAra. On line 66 onwards, we declare the overloaded mkN -- only type signatures, no implementation yet. The comments, like "-- Compound noun with invariant attribute." are those that appear in the synopsis.

Then at some point in every well-formed Paradigms module, there is the magical part that starts with --. (https://github.com/GrammaticalFramework/gf-rgl/blob/master/src/arabic/ParadigmsAra.gf#L312-L316)

--.
--2 Definitions of paradigms
-- The definitions should not bother the user of the API. So they are hidden from the document.

Everything under --. is not shown in the synopsis, but is still available to use, for any module that opens ParadigmsXxx. Usually the parts under --. just implement all the things declared previously, but you can also implement more stuff.
In the Arabic mkN https://github.com/GrammaticalFramework/gf-rgl/blob/master/src/arabic/ParadigmsAra.gf#L364-L389 , I have included the mkN with type signature NTable -> Gender -> Species -> N only in the hidden part: not visible in the synopsis, but still available, so that old code doesn't break.

If you can do something similar with the old Russian paradigms, that would be polite to the old applications. If it's too much hassle to make them good, at least you could include them with the same name and type signature, but just add some bad guesses if the forms given in the old type signature are a bad choice.

Inari


--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gf-dev/b985c2dc-ec44-4fc9-9114-299d0c1ca860o%40googlegroups.com.

Roman Suzi

unread,
Aug 1, 2020, 1:59:29 PM8/1/20
to Grammatical Framework
Hi Inari,

Ok. That double-overload approach is clear. I will try to provide backwards compatibility where possible.

One more problem is with the word "like", which new grammar defines like this:

like_V2 = mkV2 (mkV imperfective intransitive "нравиться" "нравлюсь" "нравится") Dat ;

Note, that this verb does not have non-reflexive form old grammar tries to use to mimic English like. So in new grammar the meaning of the verb is opposite.

For example in here:

> gt UseCl (TTAnt TPast ASimul) PPos (PredVP (UsePron he_Pron) (ComplSlash (SlashV2a like_V2) (MassNP (UseN woman_N))))  | l -treebank
AllRusAbs: UseCl (TTAnt TPast ASimul) PPos (PredVP (UsePron he_Pron) (ComplSlash (SlashV2a like_V2) (MassNP (UseN woman_N))))
AllRus: он нравил &+ ся женщине    * = woman like him
AllEng: he liked woman

and there is no other so general verb in Russian to replace "нравиться". (the closest analogue - "симпатизировать" (take a liking to smb.)  - is too special, and one can't say that about liking a page in facebook sense, and the verb is not even transitive - requires dative.)
What can be done with this? Its certain that old grammar does not render it correctly anyway. And to support the same semantics subject should be swapped with object, which is not quite trivial in abstract grammar.
My guess is to break the meaning because naive translation via abstract grammar is not so accurate anyway, and some transformations are needed, but maybe I can get some advice still?

Maybe, due to importance of "X likes Y" there maybe common abstract constructor, which will hide implementation details, as it is with have_name_Cl , and even less important cup_of_CN .

Now that I compared old and new grammars, new one seems to be way more accurate (even though there could be some bugs in it still).
One problematic moment is that infinitive sentences (via SC?) are not that advanced, even for English.

For example, this does not work:

p -cat=Text "how to talk ?"

and phrase like
p -cat=Text "to live is to fly ."

is supported in English via some obscure extension (InOrderToVP), even though SC promises to be subject and object. So now I am thinking how to simplify forming indefinite sentences in Russian without adding new tenses, or even how to introduce a new mood for that, or whether something useful can be found in Extend.

I also do not quite understand what "well-formed" grammar should have at minimum. For example, for documentation. Finnish uses functor for that, old Russian does not. I may need help with those once the new grammar is ready.

With best regards,
Roman

Inari Listenmaa

unread,
Aug 1, 2020, 3:14:45 PM8/1/20
to Grammatical Framework
Hi,

One more problem is with the word "like", which new grammar defines like this:

like_V2 = mkV2 (mkV imperfective intransitive "нравиться" "нравлюсь" "нравится") Dat ;

Note, that this verb does not have non-reflexive form old grammar tries to use to mimic English like. So in new grammar the meaning of the verb is opposite.

Yes, this happens also in Romance languages that have the same dative behaviour with like. For example Spanish:
Lang> p "I like grammars" | l
I like grammars
yo gusto gramáticas -- should be "me gustan gramáticas"

This is not a massive problem per se, the application grammarian just needs to know this is the case. If your application grammar has a function like the following:

fun Like : NP -> NP -> Cl ;

then you just need to linearise it as follows:

lin Like subj obj = mkCl obj like_V2 subj ;
You can document such things in the README of Russian.


Maybe, due to importance of "X likes Y" there maybe common abstract constructor, which will hide implementation details, as it is with have_name_Cl , and even less important cup_of_CN .

If you feel like adding more things to Constructions or Extend, feel free to make a pull request! Could even add a functor with default implementations for Constructions, like Extend has already.

Now that I compared old and new grammars, new one seems to be way more accurate (even though there could be some bugs in it still).
One problematic moment is that infinitive sentences (via SC?) are not that advanced, even for English.

For example, this does not work:

p -cat=Text "how to talk ?"

There's an Extend function for that

AllEngAbs> l PredIAdvVP how_IAdv (UseV walk_V)
how to walk

and phrase like
p -cat=Text "to live is to fly ."

is supported in English via some obscure extension (InOrderToVP),
[…]
So now I am thinking how to simplify forming indefinite sentences in Russian without adding new tenses, or even how to introduce a new mood for that, or whether something useful can be found in Extend.

PurposeVP would be a better fit for that construction. In English, InOrderToVP accepts both "to" and "in order to", but generates first "in order to".

AllEngAbs> l PredSCVP (EmbedVP (UseV live_V)) (AdvVP UseCopula (PurposeVP (UseV fly_V)))
to live is to fly

1 msec
AllEngAbs> l PredSCVP (EmbedVP (UseV live_V)) (AdvVP UseCopula (InOrderToVP (UseV fly_V)))
to live is in order to fly

even though SC promises to be subject and object.

Where does it say that? In abstract, SC is only used in the following functions

Adjective.gf:  SentAP    : AP -> SC -> AP ;  -- good that she is here
Noun.gf:       SentCN    : CN -> SC -> CN ;  -- question where she sleeps
Sentence.gf:   PredSCVP  : SC -> VP -> Cl ;  -- that she goes is good

More can of course be added, just not to the core RGL, but to Extend.

I also do not quite understand what "well-formed" grammar should have at minimum.

Yes, these things are not well documented at all. It's good that you ask, it also has the benefit that it creates written documentation about writing a resource grammar.

 For example, for documentation. Finnish uses functor for that, old Russian does not. I may need help with those once the new grammar is ready.

Documentation is the grammar that produces tables like in https://cloud.grammaticalframework.org/wordnet/ 
Search for some word, and click on some of the languages. For example, I search for "cat" in English, results in Bulgarian, and I get a table

 Съществително (ж.р.)

ед.ч. нечленувано котка
членувано котката
пълен член котката
мн.ч. нечленувано котки
членувано котките
звателна форма котко
бройна форма котки

Those terms are defined in the DocumentationBul module.  

Whether to use functor or not is just a matter of where you copypaste it from. :-P For instance, Bulgarian doesn't use the functor and the Terminology module.
Here's how the Finnish constructs "preesensin indikatiivi" for verb tables, using present_Parameter and indicative_Parameter from TerminologyFin: https://github.com/GrammaticalFramework/gf-rgl/blob/master/src/finnish/DocumentationFinFunctor.gf#L146
And here's the Bulgarian version of verb inflection table, all strings defined there in DocumentationBul: https://github.com/GrammaticalFramework/gf-rgl/blob/master/src/bulgarian/DocumentationBul.gf#L296-L395

The benefits of using a functor are that it is perhaps a bit cleaner, with vocabulary and rules separated, and you can make it multilingual easily. DocumentationFin.gf is describing Finnish in Finnish, and DocumentationFinEng is describing Finnish in English. The hard work is all the stuff that looks like this, and is in the functor:

         tr (th "1.p"  ++ td (vfin (Presn Sg P1)) ++ td (vfin (Presn Pl P1))
             ++ intagAttr "td" "rowspan=3" (vfin (PassPresn True))) ++
         tr (th "2.p"  ++ td (vfin (Presn Sg P2)) ++ td (vfin (Presn Pl P2))) ++
         tr (th "3.p"  ++ td (vfin (Presn Sg P3)) ++ td (vfin (Presn Pl P3))) ++

So then when you want to describe Russian in yet another language, you need to just duplicate the Terminology module. 

Inari

Roman Suzi

unread,
Aug 2, 2020, 3:43:35 PM8/2/20
to Grammatical Framework
hi!

I am still not understanding Extend fully, it seems. Here is my ExtendRus.gf ( https://github.com/rnd0101/gf-rgl/blob/mew-rus-rg/src/russian/ExtendRus.gf ), and the problem I can't get rid of is:

> i AllRus.gf
linking ... unknown identifier AdvIsNP
> i -retain AllRus.gf
593 msec
> l mother_N2
linking ... unknown identifier AdvIsNP

I used to have AllRus.gf like:

concrete AllRus of AllRusAbs = LangRus, ExtraRus ** open ExtendRus in {flags coding=utf8;}

but then ExtendRus was not available at all. (by that I mean eg iFem_Pron could not be used in the shell).

Now with:

concrete AllRus of AllRusAbs = LangRus, ExtraRus, ExtendRus ** {flags optimize=all ; coding=utf8;}

there is that linking problem. (it happens both when I am in the src/russian directory and when I compile everything and install).

I've tried to define AdvIsNP in the ExtendRus, "minus" it in inheritance part in combinations. Comparing with other grammars (and how they inherit that functor), I do not really see what is so special with new Rus RG. Maybe the problem is somewhere else, just does not surface? (I had such when I used PredIAdvVP in Construction).

Please, help.

-Roman

Inari Listenmaa

unread,
Aug 4, 2020, 3:10:02 PM8/4/20
to gf-...@googlegroups.com
Hi,

I am still not understanding Extend fully, it seems.

I just tried pulling your branch and it works for me, I assume that you fixed it on your own! I'm answering this just for the sake of documentation :)

I used to have AllRus.gf like:

concrete AllRus of AllRusAbs = LangRus, ExtraRus ** open ExtendRus in {flags coding=utf8;}

but then ExtendRus was not available at all. (by that I mean eg iFem_Pron could not be used in the shell).

Yes, if you only open and not inherit something, then the functions are usable in the module where you open it, but not visible outside. (Tangentially related: RLG (or any) opers can be made visible for playing around in the GF shell with -retain flag, see https://inariksit.github.io/gf/2018/08/28/gf-gotchas.html#re-export-rgl-opers-in-application-grammar)

So if your concrete syntax AllRus was defined as LangRus and ExtraRus, then the abstract syntax AllRusAbs must be defined as Lang, ExtraRusAbs. But really its needs to be Lang, ExtraRusAbs, Extend, or just Lang, Extend

With the pair

abstract AllRusAbs = Lang, ExtraRusAbs ;
concrete AllRus = LangRus, ExtraRus ** open ExtendRus

you could use any definitions in ExtendRus to linearise the functions of Lang and ExtraRusAbs. But that'd be silly, because those functions are already linearised in LangRus and ExtraRus.


The next option was to inherit Extend instead of opening, like this

abstract AllRusAbs = Lang, ExtraRusAbs ;
concrete AllRus = LangRus, ExtendRus

But without changing the abstract syntax, it caused errors, as you noticed. 

So the correct combo is

abstract AllRusAbs = Lang, ExtraRusAbs, Extend ;
concrete AllRus = LangRus, ExtraRus, ExtendRus ;

Inari

Roman Suzi

unread,
Aug 5, 2020, 12:30:22 AM8/5/20
to Grammatical Framework
Yes, thank, Inari! Explanation in this case is important. But for example in Finnish I see

concrete AllFin of AllFinAbs =
  LangFin - [SlashV2VNP,SlashVV, TFut], ---- to speed up linking; to remove spurious parses
  ExtraFin - [ProDrop, ProDropPoss, S_OSV, S_VSO, S_ASV, AdvExistNP] -- to exclude spurious parses
  ** open ExtendFin in {} --- to make it compile by default

I have not analysed it further, but there certainly are reasons why it was done this way. "to make it compile by default" does not bring more understanding. Fortunately, new Russian RG is quite light (or so is my feeling when loading it compared to Finnish) - thanks to Basque-glueing approach and using records.

Am thinking how to introduce Russian-specific word-order variations, part of speech (transgressive), wider possibilities for impersonal and infinitive VPs (Russian is really rich in those), some minor tweaks to prepositions varying according to "pre" (this can partially be left out to applications even because rules are sometimes semantic). To my mind, current abstract RG is not giving enough room to those VP or is too English-centric, but I have not yet checked every Extend entry. One specific problem I have is "impolite" or "order" imperative, which grammatically is literally just "to do something". It is the main mode used in computer interfaces (in addition to dogs and other "immediate imperative" situations). I can bind it to Imp Sg P1 (which is the nearest that can be used when someone gives an order to herself/himself/itself), but I have not found API path for that yet.

I am not sure how to use gftest for the main test suite as it gets stuck at ListNP for long time (is it supposed to run for hours?), but "lexicon-only" cases look good.

Now also DocumentationRusFunctor is nearing completion. Here another question is whether it (will be) visible on the web or is it just a resource used (how?) in applications?

-Roman

Inari Listenmaa

unread,
Aug 5, 2020, 3:00:38 AM8/5/20
to gf-...@googlegroups.com
Hi,

 But for example in Finnish I see 

concrete AllFin of AllFinAbs =
  LangFin - [SlashV2VNP,SlashVV, TFut], ---- to speed up linking; to remove spurious parses
  ExtraFin - [ProDrop, ProDropPoss, S_OSV, S_VSO, S_ASV, AdvExistNP] -- to exclude spurious parses
  ** open ExtendFin in {} --- to make it compile by default

I have not analysed it further, but there certainly are reasons why it was done this way. "to make it compile by default" does not bring more understanding. Fortunately, new Russian RG is quite light (or so is my feeling when loading it compared to Finnish) - thanks to Basque-glueing approach and using records.

I understand that following other languages is confusing! I don't know why Aarne chose to make AllFinAbs only Lang, ExtraFinAbs, but it's not all that important: if you want to use ExtendFin in applications, then you just open ExtendFin, no need to bother with AllFIn. Opening Syntax brings most of Lang into scope, with standard API overloaded names. So if you opened AllFin and SyntaxFin in an application, then you'd have e.g. mkNP and DetCN both in scope. (I have a feeling that sometimes when I do things along that line, I get the "Warning: atomic term X, conflict module1.X, module2.X", but this happens so rarely that my memory may be off. :-P)

I guess that "to make it compile by default" means just that whenever AllFin is compiled, ExtendFin is also compiled. As opposed to, anyone can implement new stuff in Extend, but CI doesn't check whether it actually compiles.

Am thinking how to introduce Russian-specific word-order variations, part of speech (transgressive), wider possibilities for impersonal and infinitive VPs (Russian is really rich in those), some minor tweaks to prepositions varying according to "pre" (this can partially be left out to applications even because rules are sometimes semantic). To my mind, current abstract RG is not giving enough room to those VP or is too English-centric, but I have not yet checked every Extend entry.

Yes, it's very likely that many of these don't exist yet in Extend. Some of the VP things might be reasonable to implement in PurposeVP, InOrderToVP, ByVP and Gerund{CN,NP,Adv}. If not, you can define new ones that are helpful for developers who work in Russian. 
Imagine that there is no way to say "by doing X" in Russian, and there is no Extend or RGL construction for something that you say often in Russian.The solution is to skip ByVP for Russian (leave it as variants {} in ExtendFunctor) and define a new abstract syntax function in Extend that applies to only that Russian case. That is, we don't try to repurpose ByVP for the new construction, that would just create confusion.

There are other language-specific things in Extend as well, for example

-- Romance 
  UseComp_estar : Comp -> VP ; -- (Cat, Spa, Por) "está cheio" instead of "é cheio"

-- German
  UttAccNP : NP -> Utt ; -- him (accusative)
  UttDatNP : NP -> Utt ; -- him (dative)
  UttAccIP : IP -> Utt ; -- whom (accusative)
  UttDatIP : IP -> Utt ; -- whom (dative)

If you define a new one, for Russian in this case, the languages that don't have the corresponding syntactic construction can just ignore it. They will never need to use it, so it doesn't matter that it's implemented as just variants {} in ExtendFunctor. It is a good practice to have all things compile for all languages, so that copypasting from one language produces immediately stuff that compiles, but eventually the grammarians for the other languages would anyway want to change that construction to something else, that exists in the language.

One specific problem I have is "impolite" or "order" imperative, which grammatically is literally just "to do something". It is the main mode used in computer interfaces (in addition to dogs and other "immediate imperative" situations). I can bind it to Imp Sg P1 (which is the nearest that can be used when someone gives an order to herself/himself/itself), but I have not found API path for that yet.

Yes, there is no such constructor that takes Sg P1 imperative. The alternatives are

Sg/Pl P2, additional polite form with UttImp{Sg,Pl,Pol}
Sg P3 and Pl P1 in Idiom. Pl P1 is exported in the API as lets_Utt (https://www.grammaticalframework.org/lib/doc/synopsis/index.html#Utt), but ImpP3 isn't even exported in the API.

So you need to define your own function for that. You can have the forms to reside in the inflection table of Imp Sg P1, that's a good strategy for reusing space, but you can call it whatever you find the most descriptive in the Extend function.


Side note: how to handle things that aren't exported in the API? Say I want to use PartNP  : CN -> NP -> CN ; -- glass of wine in my grammar. I grep for that in api/Constructors.gf, to see if it's exported to some API function:

% grep PartNP Constructors.gf
      = PPartNP  ; --%

That's only PPartNP, not PartNP that I want. So then I would open in my grammar NounXxx qualified, i.e. (SomePrefix=NounXxx):

concrete MyGrammarXxx of GrammarXxx = Foo, Bar ** open
 SyntaxXxx,
 ParadigmsXxx,
 (N=NounXxx) in {

lin
  MyFunction a b c d = … (N.PartNP a (mkNP b) … ; 

}

I am not sure how to use gftest for the main test suite as it gets stuck at ListNP for long time (is it supposed to run for hours?), but "lexicon-only" cases look good.

Yes, that unfortunately happens for big grammars. (It's a, to be polite, academic software. :-P) If you leave it running overnight and it hasn't produced anything, it's probably not going to make it. But I've  gotten results after 3 hours, so there is hope!

I've had better luck with using gftest on an application grammar that uses the resource grammar that I want to test, or just testing a single function. Or in the worst case, commenting out the worst offenders.

Now also DocumentationRusFunctor is nearing completion. Here another question is whether it (will be) visible on the web or is it just a resource used (how?) in applications?

No, it won't be visible anywhere by default. If someone adds Russian to the wordnet interface, then they'd use the DocumentationRus to create similar inflection tables as I showed for Bulgarian.

Inari


Roman Suzi

unread,
Aug 9, 2020, 2:47:06 AM8/9/20
to Grammatical Framework
Hi Inari,
Great things above for documentation! Maybe this correspondence can make a good blog entry. I have a new bunch of questions as Russian v2 grammar becomes ready.

1. Are there any better mechanism to see how API maps to low-level functions? In debugging phrasebook it's very painful to trace down constituents... In new Russian RG I've added type comments to (almost) every function, so I can find anything just by grepping for eg to find all funtions returning VP as " -> VP ;" due to consistent code style (I found it very useful). But then in phrase book confronted with: PSentence (SHaveNoMass YouPlurPolFemale Cheese)  I need to lookup SHaveNoMass, then lookup each API function through overload, then repeat recursively... I know about cc -trace, but that is so verbose (by default?) that I can only get some idea which functions were touched. After some time I came down to:

UseCl (TTAnt TPres ASimul) PNeg (ImpersCl (ComplSlash (Slash3V3 have_V3 (UsePron youPl_Pron)) (MassNP (UseN woman_N))))

which is (I am still not 100% sure) what the above phrasebook entry is...

2. Sometimes parsing helps, but I see now new Russian grammar got so many extensions, that parsing sometimes does not work (actually, it's not different from the old one) - maybe, there is some possibility for parsing to switch from depth first to BFS? (I dont know what is behind search algo, but for some reason very obscure and long trees with piled up adverbs come up first always. Sometimes I wait till the end and useful and shortest ones are there.) ProDrop is especially so malign, that I am thinking if it should be removed by default.

3. Now Russian grammar has ordinals till 999999 except for XXX000 cases (which are so different in Russian, that extra effort is needed to support them).

4. Symbol. Have I understood right that it's not included in AllXXX and applications need to import it explicitly? The problem is I do not know how to test it with something meaningful. Luckily, it's quite simple.

5. Here is current list of problems:


I still plan to deal with a very visible problem with direct object negation issue (plus transgressive placement - see below). This will probably cause substantial shifts because negation needs to be propagated to the case-selection for direct object, maybe, I will need to make direct complement into it's own field - I am not yet sure. To add to controversy, Russian is going for the last 200 years into using both Gen and Acc in negations so Gen should be the safe default, and I guess pseudo-prep selection can be used so user can insist on using Acc when needed), and then if nothing else will be found I will ask for pull request. Phrase book will need very minimal changes, but

6. For some reason I am not getting linearizations for eg Action in phrasebook even though Cl has linref in Russian v2... But sentences aren't always linearized - this makes debug difficult.

Commenting on your "PurposeVP, InOrderToVP, ByVP and Gerund{CN,NP,Adv}"  Yes, I need to take a look. I am afraid there is not Gerund in Russian, so things like:

GerundCN : VP -> CN ; -- publishing of the document (can get a determiner)
GerundNP : VP -> NP ; -- publishing the document (by nature definite)
GerundAdv : VP -> Adv ; -- publishing the document (prepositionless adverb)

can't be really formed from VP, because that would require to form a new lexical entry (a noun) out of the verb, which is to put it mildly is not regular process. The nearest I see (and I've implemented) are converbs - transgressives. But those may be too out-of-place as gerunds... ByVP can also be covered with transgressives

7. related: I've added the following:

mkAdv : overload {
  mkAdv : Str -> Adv ;
  mkAdv : Temp -> Pol -> VPSlash -> Adv ; -- introduce transgressive: "делая что-то ,"    "(was) (not) doing smth, "
} ;

which is hackish and unusable - I need to move it either to Russian-only function or adapt something from Extend, but I've not found anything of the sort as transgressives have almost same power in Russian as verbs - thus VPSlash.

Do you think I rather make Gerund / ByVP with transgressives? But the problem is how to add the tense and polarity and also all three above will be quite awkward CNs,NPs. Advice needed.

With best regards,
Roman

Inari Listenmaa

unread,
Feb 15, 2021, 8:42:12 AMFeb 15
to gf-...@googlegroups.com
Hi Roman,

I realise that I never answered this! I don't have answers to all of your questions, and maybe you have solved some problems already in the past 6 months, but I just thought of tying some loose ends.



Great things above for documentation! Maybe this correspondence can make a good blog entry.



1. Are there any better mechanism to see how API maps to low-level functions? 

Unfortunately, none that I'm aware of. Grepping or RGL source browser http://www.grammaticalframework.org/~john/rgl-browser/ . I agree that cc -trace is way too verbose.


2. Sometimes parsing helps, but I see now new Russian grammar got so many extensions, that parsing sometimes does not work (actually, it's not different from the old one) - maybe, there is some possibility for parsing to switch from depth first to BFS? (I dont know what is behind search algo, but for some reason very obscure and long trees with piled up adverbs come up first always. Sometimes I wait till the end and useful and shortest ones are there.) 

I don't know about the search algorithms, maybe someone else reading this could answer. Anecdotally, I've also found that the useful trees are at the end.

This obviously doesn't work for all situations, but sometimes it's surprisingly useful to just use GF as a morphological analyser. There's a command morpho_analyse in the GF shell, ma for short: https://inariksit.github.io/gf/2018/08/28/gf-gotchas.html#ma 

Another unexpected use case for ma is if your grammar is extremely ambiguous. Say you have a few empty strings in strategic places, or are using a large lexicon with tons of synonyms–you’ll easily get hundreds or even thousands of parses for a short sentence. Example:

Lang> p "작은 고양이가 좋습니다" | ? wc
     359   11232   86068

Lang> ma "작은 고양이가 좋습니다"
작은
small_A : s (VAttr Pos)
short_A : s (VAttr Pos)

고양이가
cat_N : s Subject

좋습니다
good_A : s (VF Formal Pos)

This is much more readable than 359 trees. The subject is a small or short cat, and the predicate is that the cat is good. Just by seeing the morphological parameters from the inflection tables, we can infer that small is attributive and good is predicative.

Not quite the same as getting just the (single) tree PredVP (MassNP (UseN cat_N)) (UseComp (CompAP (PositA small_A))), but you can almost reconstruct the information. :-P


4. Symbol. Have I understood right that it's not included in AllXXX and applications need to import it explicitly? The problem is I do not know how to test it with something meaningful. Luckily, it's quite simple.

That's right, Symbol isn't included in All. The intended way is to use it through the API module Symbolic (http://www.grammaticalframework.org/~john/rgl-browser/#!api/Symbolic). If you want inspiration for sentences to test, you can take the examples from Symbolic, like "advanced level at least four". 

Inari

Roman Suzi

unread,
Feb 16, 2021, 8:49:05 AMFeb 16
to Grammatical Framework
Hi Inari,
Thanks! I hope those hints will be useful for readers. Grepping is of course an answer. The "ma" hint did not work quite nicely, producing a lot of:

InflectionV2 : s2
InflectionV2 : s2
...
InflectionV2 : s2
InflectionV : s2
InflectionV : s2
...
InflectionV : s2
when I tried  ma "когда не бы &+ ло яблока"
(not quite sure why... maybe, something is becoming empty and causes a lot of matches)

I have no idea how much Russian RG is in use, but I hope there were no bugs found in it so far?

With best regards,
Roman
Reply all
Reply to author
Forward
0 new messages