cg-proc -w makes single-letter elements all caps

marc.rier...@gmail.com

unread,

Dec 21, 2018, 5:50:53 AM12/21/18

to Constraint Grammar

Hello,

While contributing to two Apertium language pairs (English-Catalan and Romanian-Catalan), which both use CG in all directions, I have noticed what seems to be a bug in how cg-proc -w normalises the case of single-letter elements.

Take these two examples:

^I/PRPERS<prn><subj><p1><mf><sg>$ (English)

^A/AVEA<vbavea><pri><p3><sg>$ (Romanian)

In both cases, the dictionary forms are all caps, which incorrectly makes the translations to Catalan automatically all caps as well, when they should be "Prpers" and "Avea", respectively. My guess is that CG does not make a difference between the previous examples and multi-letter examples such as "HOUSE" or "TREE".

In comparison, if CG is disabled in these pairs and normalisation is done directly by lt-proc, the correct "Prpers" and "Avea" analyses are given as output.

Thanks!

Marc Riera

Tino Didriksen

unread,

Dec 21, 2018, 5:55:48 AM12/21/18

to Constraint Grammar

Noted. Created issue https://github.com/TinoDidriksen/cg3/issues/24

-- Tino Didriksen

marc.rier...@gmail.com

unread,

Dec 21, 2018, 6:43:30 AM12/21/18

to Constraint Grammar

Sorry, I incorrectly wrote "Prpers" and "Avea" instead of "prpers" and "avea" in my previous message. I meant to use all-lowercase.

After doing some further tests with lt-proc (without -w), this seems to be tricky. Lt-proc decides the case based on dictionary information (which it has access to, as it analyses the surface forms). According to this, the previous examples should be all-lowercase, given that is how they appear in the dictionary. However, if we add an example proper noun (<np>) called "E" with a surface form "E" to the dictionary, and we analyse it using lt-proc without the -w flag, lt-proc correctly outputs "^E/E<np>$ ", as it is uppercase in the dictionary.

If cg-proc has to take such decisions, it may be impossible to do it reliably without checking the dictionary (which it currently does not have access to).

Thanks!

Marc Riera

El divendres, 21 desembre de 2018 11:50:53 UTC+1, marc.rier...@gmail.com va escriure:

Reply all

Reply to author

Forward