Méta LETTRE

Claude Martineau

unread,

Mar 14, 2016, 9:58:09 AM3/14/16

to Unitex-GramLab

J'ai ajouté hier à titre exploratoire LETTRE comme synonyme de LETTER afin que ceux qui le désirent puissent continuer à l'utiliser en français.
En outre, je crois qu'il est plus que prématuré de déprécier l'utilisation de MOT, MIN, MAJ.
En effet, il est tout à fait possible dans une prochaine version de traiter ces méta de manière similaire à la localisation logiciel.
Il est toujours meilleur d'augmenter les libertés plutôt que de les restreindre.

Claude

Denis Maurel

unread,

Mar 14, 2016, 10:12:17 AM3/14/16

to Claude Martineau, Unitex-GramLab

Bonjour Claude

Mais c'est ce qu'on a fait, bien sûr. Simplement on propose dans le manuel d'utiliser plutôt les nouvelles. Rien de contraint.

Cordialement,

Denis Maurel

____________________________________
Professeur Denis Maurel
Université François Rabelais Tours

Recherche: bureau 215
LI (Laboratoire d'Informatique)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Tel. (33) 2.47.36.14.35
Telc. (33) 2.47.36.14.22

Enseignement:
Responsable de la licence Matic, bureau 3200
IUT, Département TC
29 rue du Pont-Volant
37082 Tours cedex 02
France
Tel. 02.47.36.75.50
Telc. 02.47.36.76.23
Secretariat: 02.47.36.76.30

mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/

http://international.univ-tours.fr/offre-de-formation/licence-professionnelle-commerce-specialite-marketing-et-technologies-de-l-information-et-de-la-communication-matic--264012.kjsp?RH=ACCUEIL_FR

--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at https://groups.google.com/group/unitex-gramlab.
To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/c3b98fea-f413-484c-896e-3d770860d09a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

eric.laporte

unread,

Mar 14, 2016, 11:50:32 AM3/14/16

to Unitex-GramLab

Hi,

I love freedom but am not in favour of the coexistence of different lexical masks with the same function, for example <LETTER> and <LETTRE>. In general, this tends to reduce readability because it unnecessarily increases the number of codes: to be able to read graphs written by others, users have to know more lexical masks than necessary. English-based lexical masks like <WORD> and <LETTER> are more easily remembered by more users. The French-based lexical masks, remnants of the 1990s, like <MOT>, remain functional for backward compatibility, but I am in favour of avoiding to create new ones (like <LETTRE> while <LETTER> already exists), and of progressively replacing them by the English-based equivalents. To this end, the manual should continue deprecating the use of <MOT> etc. and recommending users to adopt <WORD> etc. in their new graphs.

Best,

Eric

Gilles Vollant

unread,

Mar 15, 2016, 6:04:11 AM3/15/16

to Unitex-GramLab

La « dépréciation » ne peut être qu’une préconisation de mot clef préféré, car la compatibilité avec les anciens graphes est importante.

Et si à terme une localisation est ajoutée, il faudra trouver un moyen de préserver l’interopérabilité des graphes.

Après, la modification comporte deux enjeux :

- La modification du format de fichier graphe par l’ajout d’un mot clef (avec lequel il faudra être compatible dans l’avenir). Ce n’est pas mon domaine (plus informatique que linguiste). Ceci dit, on boucle ainsi une cohérence avec pour l’instant une version fr et en de chaque mot clef (si j’ai bien compris)

- Une modification de la partie « Core » (parti du code C++ utilisé par quasiment tous les utilisateurs Unitex) alors qu’on est en phase « release candidate ». C’est le même problème que celui de mon fix commité dans la révision 4300 et retiré dans la 4302 : il est dangereux de toucher au code sans se redonner quelques jours

D’autre part, élément nouveau depuis quelques jours, l’Inist a démarrer un process de test massifs de Cassys sur plusieurs dizaine voire centaine de milliers de documents.

Donc nous avons le choix entre :

- Faire un revert de la modification 4299 et revenir sur le code du 8 mars pour avoir un code non modifié depuis plusieurs jours sans grosse regression vue, et on peut sortir la 3.1 fin de semaine (mais avec un Cassys qui plante parfois)

- Au contraire, garder les modifications 4299 et reprendre mes modifications 4300, attendre le résultat des tests massifs (et nouvelles corrections éventuelles) de l’Inist et sortir vers fin mars une 3.1 avec un Cassys fiabilisé

Je suis ambivalent : j’attend la sortie de la 3.1, mais je suis désormais très tenté de bénéficier des tests massifs de l’Inist pour fiabiliser encore plus la 3.1 (sachant que tester Cassys signifie aussi tester Normalize , Tokenize, Locate, Concord qui sont déjà très éprouvé)

De : unitex-...@googlegroups.com [mailto:unitex-...@googlegroups.com] De la part de Claude Martineau
Envoyé : lundi 14 mars 2016 14:58
À : Unitex-GramLab
Objet : [Unitex-GramLab] Méta LETTRE

--

You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to

unitex-gramla...@googlegroups.com.

To post to this group, send email to

unitex-...@googlegroups.com.
Visit this group at https://groups.google.com/group/unitex-gramlab.

To view this discussion on the web visit

https://groups.google.com/d/msgid/unitex-gramlab/c3b98fea-f413-484c-896e-3d770860d09a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Laporte

unread,

Mar 15, 2016, 8:54:06 AM3/15/16

to unitex-...@googlegroups.com

Hi,

Le 15/03/2016 11:04, Gilles Vollant wrote :
<<

La « dépréciation » ne peut être qu’une préconisation de mot clef préféré, car la compatibilité avec les anciens graphes est importante.

>>

In any case, deprecation is a preconization/recommendation, nothing else.

Best,

Eric Laporte

Cvetana Krstev

unread,

Mar 16, 2016, 6:43:00 AM3/16/16

to Unitex-GramLab

Hi,

I have no objections as long as my old graphs continue to work,

Best Cvetana

Alexis Neme

unread,

Mar 17, 2016, 9:27:40 AM3/17/16

to Unitex-GramLab

Hello

- <LETTER> should be the standard and we should keep <LETTRE> for compatibility with previous versions of Unitex. Same for <WORD> and <MOT>.

- I suggest with time the removal of masks such as <MOT> and <LETTRE> and all French terminology, in version 4.x (for example) to reduce the computation time with FSTs.

CLAUDE >>> En effet, il est tout à fait possible dans une prochaine version de traiter ces méta de manière similaire à la localisation logiciel.

No, I do not agree with this idea.

All attempts to create a TAGSET in Arabic scripts: such as <إسم> instead of <N> failed. For Arabic Language Ressources, the TAGSET standard in USA, for lexicons and Treebanks is in Latin alphabet: <N>, <V>, ...; so English terminology <LETTER>, <WORD> ... is preferable, and it is a must in scientific publications.

Finally, in all languages and in Arabic, chemistry textbooks are in Arabic with the formula H2O in Latin for 'water' and not ح2و.

Maybe in the middle-age in Europe, it was the contrary!

Cheers,

Alexis

eric.laporte

unread,

Mar 19, 2016, 12:30:18 PM3/19/16

to Unitex-GramLab

Hi,

I agree with Alexis' idea that resources are more readable by our international community if metalanguage like <WORD>, <UPPER>, <N>, <PREP>... is international.

However, I an not in favour of removing French masks like <MOT> and <MAJ>, because there are many graphs with them in present and past projects, and I am not sure the removal would make a big difference in computation time.

Compatibility with previous versions is not a good reason to have <LETTRE> in Unitex: <LETTRE> was introduced last week. What backward compatibility requires is what existed instead of <LETTER> before September 2015: it was <MOT>, which is still operational.

Best,

Eric Laporte

Gilles Vollant

unread,

Mar 19, 2016, 1:59:19 PM3/19/16

to eric.laporte, Unitex-GramLab

Like Eric, I think all previous existing mask (except <LETTRE> for just one week) must be supported, and we must suggest using English version in future graph.

English is the better solution for share document around the world…

If we want allow end user uses mask in his own langage, in a future version, we can imagine a graph editor which replace on screen, and save in English in .grf file.

Like Microsoft excel : a French user enter a French formula (like =SOMME(A1:B1)), then an English user open the excel file and read =SUM(A1:B1).

De : unitex-...@googlegroups.com [mailto:unitex-...@googlegroups.com] De la part de eric.laporte
Envoyé : samedi 19 mars 2016 17:30
À : Unitex-GramLab
Objet : [Unitex-GramLab] Re: Méta LETTRE

--

You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at https://groups.google.com/group/unitex-gramlab.

To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/a802ab78-b1e5-4803-8a42-36a2709d94b8%40googlegroups.com.

Cristian Martinez

unread,

Mar 20, 2016, 2:21:08 PM3/20/16

to unitex-...@googlegroups.com

Dear all,

In order to keep a backward compatibility with the large majority of legacy graphs, I couldn't agree with the suppression of the old masks.

As well as previous comments on this thread, I agree with the inclusion of the English-version codes <WORD>, <UPPER>, <LETTER>... and to support the use of the English as the neutral language for new meta-symbols.

That said, I beg to differ with the fact that those English-masks have been introduced in a patch revision (3.1.4072-beta) and not within the next minor release (3.2.0-alpha). As a matter of fact, starting the next release, we will start to enforce the use, as many other open source projects, of the Semantic Versioning guidelines as much as possible.

One of the main issues related to bundle the new masks in a revision release is to force users to have a not fully-tested feature included in our next stable release.

That is to say: e.g.

- New masks did not support the use of LocateTfst until a fix (r4276) made only 3 weeks ago. We will be releasing the next stable version at the end of this week, hence this is a short window to test that out.

- Not to mention that both <LETTER> and <LETTRE>, that were included to avoid the use of <MOT> in the morphological mode, have inconsistent behavior in normal graphs. To be more specific, as shown in the next figure, in a normal graph <LETTER> recognizes a WORD and not a LETTER. IMHO, this is not a expected behavior, not the feature to include in a stable release.

As a final point, even if I completely agree with the inclusion of new masks as well I will help make it happen, I'm afraid to bundled them within the upcoming 3.1 stable. In consequence, I strongly propose to postpone its introduction until the next minor release, i.e 3.2.0-alpha.

This will help our community to conduct more test and find as many bugs as possible; and what's more, to build together a consensus on improving the graph meta-symbols.

Cheers,

CM

Letter.grf

Anubhav Gupta

unread,

Mar 21, 2016, 10:55:18 AM3/21/16

to Cristian Martinez, Unitex-GramLab

c'est à cause de r4076

Il faut supprimer les lignes 868 et 869 dans Text_parsing.cpp

2016-03-20 19:21 GMT+01:00 Cristian Martinez <cristian...@univ-paris-est.fr>:

Dear all,

In order to keep a backward compatibility with the large majority of legacy graphs, I couldn't agree with the suppression of the old masks.

As well as previous comments on this thread, I agree with the inclusion of the English-version codes <WORD>, <UPPER>, <LETTER>... and to support the use of the English as the neutral language for new meta-symbols.

That said, I beg to differ with the fact that those English-masks have been introduced in a patch revision (3.1.4072-beta) and not within the next minor release (3.2.0-alpha). As a matter of fact, starting the next release, we will start to enforce the use, as many other open source projects, of the Semantic Versioning guidelines as much as possible.

One of the main issues related to bundle the new masks in a revision release is to force users to have a not fully-tested feature included in our next stable release.

That is to say: e.g.

- New masks did not support the use of LocateTfst until a fix (r4276) made only 3 weeks ago. We will be releasing the next stable version at the end of this week, hence this is a short window to test that out.

- Not to mention that both <LETTER> and <LETTRE>, that were included to avoid the use of <MOT> in the morphological mode, have inconsistent behavior in normal graphs. To be more specific, as shown in the next figure, <LETTER> recognizes a WORD and not a LETTER. IMHO, this is not a expected behavior, not the feature to include in a stable release.

As a final point, even if I completely agree with the inclusion of new masks as well I will help make it happen, I'm afraid if we bundled them within the upcoming 3.1 stable. In consequence, I strongly propose to postpone its introduction until the next minor release, i.e 3.2.0-alpha.

This will help our community to conduct more test and find as many bugs as possible; and what's more, to build together a consensus on improving the graph meta-symbols.

Cheers,

CM

--

You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at https://groups.google.com/group/unitex-gramlab.

To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/28f831c2-bf94-4457-a3ba-c5e9db04bf0b%40googlegroups.com.

eric.laporte

unread,

Mar 21, 2016, 2:11:39 PM3/21/16

to Unitex-GramLab

Hi,

I am not really afraid of the English-mask bugs. Thanks to Cristian for discovering the unexpected behaviour of <LETTER> in this Letter.grf graph. Sure, this behaviour is not the expected one: it was agreed <LETTER> would be valid only in the morphological mode (cf. Denis's postings dated September 29 and October 7). But precisely, <LETTER> lexical masks are supposed to occur in graphs only in the morphological mode, so the bug is not crucial. And I guess it is not very difficult to solve, now that we know it. In addition, if we postpone the introduction of the English masks until after the release of stable version 3.1, these masks will be unavailable during seeral years to users that install only stable versions. This would be a pity for new users.