Dear All
We had problems with texts containing curly brackets, for instance mathematical texts. So we propose to parse precisely the inside of the curly brackets to see a dictionary entry or a mathematical formula.
After mails of Eric and Cvetana, we propose to limit characters used for dictionary entries. Page 43 of the Unitex manual, we read:
An entry of a DELAF is a line of text terminated by a newline that conforms to the following syntax:
apples,apple.N+conc:p/this is an example
We propose to limit the inflected form and the canonical form:
* as of now: any character, except comma, dot, plus, colon, slash, escape character, curly bracket
* with escape character before: comma, dot, plus, colon, slash, escape character, curly bracket
We propose to limit the sequence of grammatical and semantic information:
* Latin non accented alphabet
* digits, underscore, hyphen, tilda, equal to
* plus to introduce feature
* with escape character before: comma, dot, plus, colon, slash, escape character, curly bracket
* colon to introduce morphological features
* slash to introduce comment
We propose to limit the comment:
* any character, except curly bracket and escape character
* with escape character before: escape character, curly bracket
Do you agree with this proposal for the Unitex 3.2 version? do you suggest other specification?