GF seminar #2: vote for date & time

39 views
Skip to first unread message

Inari Listenmaa

unread,
Feb 1, 2022, 9:37:46 PM2/1/22
to Grammatical Framework
Hi all!

We had the first GF seminar back in November, and now it's been 3 months, so time for a second one! If you weren't around for the first one, the gist of it is:

In addition to GF summer schools every other year, we have quarterly online seminars, where we gather to hear each others' updates and chat informally for 1.5-2 hours. It's nice to have some kind of presentations, but it can just be live coding/spontaneous demo. Of course, if you have a presentation ready that you want to give, that's perfectly fine too!

Vote for a time in this doodle https://doodle.com/poll/2b9a7kr8y9zztn2k  by Saturday 5th February. I will announce the seminar date&time on Sunday 6th February.

Also email me if you want to give a longer talk than a 5-10-min demo of something. I will update the agenda in http://www.grammaticalframework.org/~inari/gf-seminar/.

If you have any questions, email me!

Cheers,
Inari

Alexandre Rademaker

unread,
Feb 2, 2022, 5:29:11 PM2/2/22
to gf-...@googlegroups.com
Hi GFers,

The DELPH-IN community usually construct grammars with the support of sets of sentences (positive and negative examples). For small grammars we usually start with simple text files with one sentence per line (using * for negative examples) but later, we have tools like https://github.com/delph-in/docs/wiki/ItsdbTop or https://github.com/delph-in/docs/wiki/FftbTop for profiling and treebanking. What would be the equivalent tools in GF universe?

Best,

--
Alexandre Rademaker
http://arademaker.github.io
http://researcher.ibm.com/person/br-alexrad


Aarne Ranta

unread,
Feb 22, 2022, 2:20:05 AM2/22/22
to gf-...@googlegroups.com
Hi Alexander,

Sorry for the late response to your relevant question! The main format of testing GF grammars is with treebanks, consisting of abstract syntax trees together with translations to chosen languages. A treebank entry looks as follows:

ResourceDemo: PhrUtt (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetCN this_Det (UseN grammar_N)) (ComplV2 know_V2 (DetCN (numeralDet (num (pot2as3 (pot1as2 (pot1 n2))))) (UseN language_N))))))

ResourceDemoAfr: hierdie grammatika ken twintig tale

ResourceDemoAra: يَعْرَفُ هَذَا [grammar_N] [grammar_N] عِشرِينَ [language_N] [language_N]

ResourceDemoBul: тази граматика знае двадесет езика

ResourceDemoCat: aquesta gramàtica sap vint llengües

ResourceDemoChi: 这 个 语 法 知 道 二 十 种 语 言

ResourceDemoDan: denne grammatik kender tyve sprog

ResourceDemoDut: deze grammatica kent twintig talen

ResourceDemoEng: this grammar knows twenty languages

ResourceDemoEst: see grammatika tunnab kaks &+ kümmend keelt

ResourceDemoEus: gramatik &+ a honek hogei hizkuntz &+ ak ditu

ResourceDemoFin: tämä kielioppi tuntee kaksi &+ kymmentä kieltä

ResourceDemoFre: cette grammaire connaît vingt langues

ResourceDemoGer: diese Grammatik kennt zwanzig Sprachen

ResourceDemoGre: αυτή η γραμματική ξέρει είκοσι γλώσσες

ResourceDemoHin: यह [grammar_N] बीस [language_N] जानता है

ResourceDemoIce: þessi málfræði veit tuttugu tungumál

ResourceDemoIta: questa grammatica conosce venti lingue

ResourceDemoJpn: この 文法 は 二 十 語 を 知ります

ResourceDemoLav: šī gramatika zina divdesmit valodas

ResourceDemoMlt: din il- &+ grammatika taf għoxrin lingwa

ResourceDemoMon: энэ хэл зүйн нэг хорь хэлийг таньдаг нь

ResourceDemoNep: यो व्याकरण बीस भाषाहरु चिन्छ

ResourceDemoNno: denne grammatikken kjenner tjue språk

ResourceDemoNor: denne grammatikken kjenner tjue språk

ResourceDemoPes: این دستور زبان بیست زبان را می‌ &+ شناسد

ResourceDemoPnb: اے [grammar_N] وی بولیاں جاندا اے

ResourceDemoPol: ta gramatyka wie dwadzieścia języków

ResourceDemoPor: esta gramática conhece vinte linguagens

ResourceDemoRon: această gramatică cunoaşte douăzeci de limbi

ResourceDemoRus: эта грамматика знает двадцать языки

ResourceDemoSnd: ھي گردان ويھ ٻوليون ڄاڻی ٿو

ResourceDemoSpa: esta gramática conoce veinte lenguas

ResourceDemoSwe: den här grammatiken känner tjugo språk

ResourceDemoTha: ไวยกรณ์ กำ ลัง นี้ รู้ ภาษา ยี่ สิบ กำ ลัง

ResourceDemoUrd: یہ گردان بیس زبانیں جانتا ہے


Such treebank entries can be used as both regression and unit tests. The most systematic unit test treebank is the one displayed in the RGL synopsis, 


with source code in


meant to address each RGL structure individually. The most common practice is to create a treebank per application project, consisting typically of examples from some legacy texts in a domain. There is also a way to generate optimal sets of test cases, where both the abstract syntax and language-specific variations are taken into account and a minimal set is produced for each language:


Traditional tests with positive and negative examples are less common, to my knowledge, because the primary interest has not been in defining what is grammatical and what is not, but meaning-preserving translations. However, some systematic work has been done at least in German:


where we have also used Stefan Müller's HPSG test sets.

All these things are a bit scattered and not widely used or even known, so I think we could be more explicit, systematic, and extensive when it comes to testing grammars. 

Everyone, feel free to continue this discussion! In particular, as I have probably not mentioned all initiatives about testing, please correct me and give your pointers!

Regards

  Aarne




 

--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gf-dev/94329617-373C-4C3A-85EA-E49C24D18DF8%40gmail.com.

Peter Ljunglöf

unread,
Feb 23, 2022, 3:13:50 AM2/23/22
to gf-...@googlegroups.com
Hej,
just to add one option - there is a simple unittest script in the GF-RGL, with some explanations on how to use it: https://github.com/GrammaticalFramework/gf-rgl/tree/master/unittest 

/Peter

Hans Leiss

unread,
Feb 26, 2022, 4:38:35 PM2/26/22
to gf-...@googlegroups.com, gf-...@googlegroups.com
As Aarne mentioned some test files I wrote 3 years ago,

> Traditional tests with positive and negative examples are less common, to my knowledge, because the
> primary interest has not been in defining what is grammatical and what is not, but meaning-preserving
> translations. However, some systematic work has been done at least in German:

> https://github.com/GrammaticalFramework/gf-rgl/tree/master/tests/german

> where we have also used Stefan Müller's HPSG test sets.

I would like to add some comments. The .gfs-scripts object-order.gfs and
passive.gfs use files examples.txt and passive.txt to do `traditional
tests with positive and negative examples'. The .txt-files annotate
examples by comment "-- accept", "-- reject" or "-- dubious".

The .gfs-script object-order.gfs splits examples.txt into separate files
examples.pos.txt, .. examples.dub.txt, lets gf parse these from Ger and
linearize them (with -treebank to Eng,Ger) to files examples.pos.tmp, ..
examples.dub.tmp and does a regression test via a diff with previously
generated files examples.pos.out, ... examples.dub.out. The other script
passive.gfs uses passive.txt similarly. Such scripts are clumsy, but it
is doable. (It would be a bit less clumsy if a .gfs could read a file
linewise and skip a "-- comment" at the end of lines.)

I wrote these files to test an implementation of pronoun-switch and some
special passive forms for ternary verbs in German. These scripts (and
some others like infinitives.gfs that linearize trees of
infinitives.trees to test control verbs and reflexive vps) use an
*extension* TestLang.gf of Lang.gf by ternary and quaternary verbs and
Slash*V4- and Pass*V3-rules. To improve the accepted/generated word
order in german clauses, I had to change the VP category and mkClause,
which resulted in a *HUGE INCREASE IN THE COMPILED GRAMMAR LangGer.gfo*
and a corresponding slowdown of grammar compilation, up to the point
that loading the compiled grammar kills the gf-process on my x86_64.

The reason for the complexity is the combination of
(i) the table np.s: PCase => Str with 9 values for PCase (4 Cases and >=
5 CPreps, i.e. glued preposition and article),
(ii) the Prep category with |PCase * Bool| = 18 parameters,
(iii) a field subjc:Prep in VP and additional fields c2:Prep and
objCtrl:Bool in VPSlash, giving 9*9*2 = 162 parameters for VPSlash,
(iv) a distinction over 24 cases to insert an object-np into a vps:VPSlash
(v) a distinction over 32 cases in mkClause to get the ordering of
verbal parts and nominal and infinitival objects right, in
particular for nested uses of VV and V2V verbs.

I had hoped to be able to improve the implementation of the changes and
reduce the complexity, but I hadn't had time for this since September
2019. I wondered that nobody complained about the complexity increase,
so: IS ANYBODY USING LangGer FOR PARSING at all?

Three weeks ago I started to pick up LangGer again and see what can be
improved. A major problem is to know what is an improvement and what is
making things worse? I have some small treebanks that I intend to
combine and extend systematically, but first I have to reduce the
complexity of the compiled grammar to be able to do testing at all.

Unfortunately, I have forgotten details of my modifications of LangGer
concerning infinitives, passives, etc., so I have begun to write down my
understanding of German grammar and to which extent it is implemented in
LangGer, using the comments in gf-rgl/abstract/Lang.gf as documentation
of the intentions of Aarne and Harald Hammerström. My hope is to arrive
at a documentation of Lang and LangGer that gives some explanations of
the scope and limitations of Lang(Ger). The plan is to support this by a
systematic treebank and some .gfs-scripts for regression testing.

But this has been on my agenda for quite a while without much progress,
so I'm not promising anything -- except that, while I am waiting for a
referee report on a submitted theory paper, I will continue working on a
document on German grammar and LangGer; as soon as I get my referee
reports, I will return to the theory paper.

Hans

Inari Listenmaa

unread,
Feb 27, 2022, 3:31:53 AM2/27/22
to Grammatical Framework
Hi Hans,

I had hoped to be able to improve the implementation of the changes and
reduce the complexity, but I hadn't had time for this since September
2019. I wondered that nobody complained about the complexity increase,
so: IS ANYBODY USING LangGer FOR PARSING at all?

I've seen two people mention the slowness/impossibility of parsing with German on GF discord (https://discord.gg/EvfUsjzmaz). Probably there are more people than that, but if they only started trying after 2019, they don't know that it used to be faster. :-P

Also to add on Peter's comment on the unittest script (https://github.com/GrammaticalFramework/gf-rgl/tree/master/unittest), there are tests for at least the following languages:


I have been using these unittests every time when I implement a new language, I take example sentences or fragments from the grammar book I use as my source, and make a unit test for it. Also when I fix bugs in existing languages, I make a test case of the feature that was changed. 

In case of bugfixes, I usually also run gftest (https://github.com/GrammaticalFramework/gftest, which Aarne mentioned already) to compare the new and old versions, and that way I see if my changes in the code resulted in unexpected changes elsewhere in the grammar.

Inari

Hans Leiss

unread,
Apr 8, 2022, 12:10:27 PM4/8/22
to gf-...@googlegroups.com, gf-...@googlegroups.com
Hi Inari,

> Hi Hans,

> I had hoped to be able to improve the implementation of the changes and
> reduce the complexity, but I hadn't had time for this since September
> 2019. I wondered that nobody complained about the complexity increase,
> so: IS ANYBODY USING LangGer FOR PARSING at all?

> I've seen two people mention the slowness/impossibility of parsing with German on GF discord (https://discord.gg/EvfUsjzmaz).
> Probably there are more people than that, but if they only started trying after 2019, they don't know that it used to be faster. :-P

during the last weeks I have redesigned the implementation of infinitival complements
in Ger, so that now the impossibility of parsing with Ger is removed and the slowness
is largely reduced. Compilation of AllGer from source takes 270s (!) on my x86_64
laptop, but loading the compiled .gfo only takes 15s, which is tolerable, I think.
(The problem is VerbGer, in particular SlashV2VNP and ComplSlash). But parsing is
still a good deal slower with Ger than with Eng, for example, we can parse

"John says that we wanted to let the children help him to paint the house blue"

in 103 msec (with tests/german/TestLangEng to get "help","let" : V2V), whereas the
corresponding

"Johann sagt , dass wir die Kinder ihm helfen lassen wollten , das Haus blau zu malen"

took 3672 msec (with tests/german/TestLangGer and lassen_V2V.isAux=True).

To handle control verbs V2V properly, infinitival complements have to depend on Agr,
so that reflexive pronouns in the infinitive VP can be made dependent on the object-
or subject-np of a clause. This is (partially) implemented in Ger, but not in Eng;
for example, subject-control does not work in Eng:

TestLang> l PredVP (UsePron he_Pron) (ComplSlash (SlashV2V versprechen_dat_V2V (ReflVP (SlashV2a wash_V2))) (UsePron i_Pron))
he promises me to wash myself
er verspricht mir , sich zu waschen

Control verbs don't work (yet) in Ger under SlashVP: NP -> VPSlash -> ClSlash, which
the RGL implements by mkClause, so it instantiates reflexives to the agreement np.a of
the *subject* np:NP argument. To make it work, ClSlash.s : ... => Str has to be turned
into ClSlash.s : ... Agr => Str first, and then a new mkClauseSlash could be defined
to correctly implement SlashVP. Maybe I will try to do this and see if it doesn't
slow down parsing further.

Hans

P.S. I've sent a pull-request, but I don't remember how to test if it works under
'present', and I had a problem with 'unknown identifier InflectionAdA', so I had to
comment out Documentation in abstract/Lang.gf. Sorry for this, and sorry for the
complexity problems introduced in 2019!

Aarne Ranta

unread,
Apr 8, 2022, 12:27:44 PM4/8/22
to gf-...@googlegroups.com
Hi Hans,

Thanks for this important work! Some more complexity than English is inevitable in German, I think, and what you describe is not worse than many other languages in the RGL. 

Working under "present" is automatically tested by "make install". in gf-rgl/. But if you have problems to make it work, I think we can fix it easily at our end.

Regards

  Aarne



--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.

Hans Leiss

unread,
Apr 9, 2022, 10:40:39 AM4/9/22
to gf-...@googlegroups.com, gf-...@googlegroups.com
Hi Aarne,

I just didn't think of "installing" the changes, I'll use "make
install" and try to fix any problems that may arise.

Hans
Reply all
Reply to author
Forward
0 new messages