Skip to first unread message

Charlotte Faure

unread,
Feb 5, 2015, 10:42:01 AM2/5/15
to unitex-...@googlegroups.com
Hi!

I'm working on French texts and resources in the context of academic homework.
I have to recognize discursive frames of different types. I've created my own dictionaries containing "markers". The first one is about organisational markers like "D'abord", "Puis", "Enfin" (~ "First", "Then", "Last", etc.) and my graph searching for such words at the beginning of a sentence works fine.
But then the complications begin. I crreated two other dictionaries. When I apply these new resources on my text, one doesn't work at all (meaning that in the Words List, the words are present and annotated with POS usual indications but not mine).It contains words that give quantitative informations like "plusieurs, "divers" or "certains" (meaning "several", diverse" or "some") none of those are recognized even though they all appear in my text.
Weirder still, one of my dictionaries works only partly : on the words in it, some are recognized some aren't. This one contains opinion markers like "selon", "d'après" and "suivant" (which all mean "according to") on those three (all present in my testing text), only "d'après" is correctly annotated with my resources.

Does someone has an idea about why this could be happening?

Thanks!

PS : don't mind the meaning of the text, it's French semi-gibberish text created just in order to test my resources...
unitex.png

Denis Maurel

unread,
Feb 5, 2015, 10:51:14 AM2/5/15
to Charlotte Faure, unitex-...@googlegroups.com


Dear Charlotte,

Did you verify your selected dictionaries? You can clear the list and choose uniquely your dictionaries.


Best regards,

Denis Maurel


____________________________________
Professor Denis Maurel
Université François Rabelais Tours
LI (Computer Science Research Laboratory)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Phone: 33-2.47.36.14.35
Fax: 33-2.47.36.14.22
mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/



--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.
To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/b08e519a-b834-4e72-8531-0d1c4a467867%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charlotte Faure

unread,
Feb 5, 2015, 10:59:09 AM2/5/15
to unitex-...@googlegroups.com
Bonjour,

For starters, thank you for this quickly posted answer and for your interest in my problem.
Yes I checked and I'm quite sure my dictionaries are selected and applied since you can see on the picture the "d'après" which is correctly followed by the annotation .MEDIATIF.
The more puzzling thing for me is why is it some words are annotated and others aren't when they are all used in the samed dictionary. I've been wondering if maybe there's some priority rule that could suppress annotation because my dictionary would be applied first and another would replace the annotation?
I'm quite the newbie with Unitex I've only been using it for homework btu never encountered this problem...
I've tried a graph using my "mediatif" dictionary and adding the literal string "selon". When I do this, my graph matches "d'après" as it always had, "selon" because I've added it literally in the graph but still no trace of "suivant"...

I'm very confused =)

Denis Maurel

unread,
Feb 6, 2015, 2:49:26 AM2/6/15
to Charlotte Faure, unitex-...@googlegroups.com


Dear Charlotte,

There is dictionaries priority if and only if a dictionary filename ends by "-" or "+". See Unitex manual. It is not the problem. Could you try to  use "Dela/Check format" to verify? and to rebuild your personnal dictionaries?
Do you try to use just your dictionaries and not the Unitex distribution dictionaries to see the result in the word list?

Best regards,

Denis Maurel


____________________________________
Professor Denis Maurel
Université François Rabelais Tours
LI (Computer Science Research Laboratory)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Phone: 33-2.47.36.14.35
Fax: 33-2.47.36.14.22
mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/



Bonjour,
--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.

Charlotte Faure

unread,
Feb 6, 2015, 11:05:03 AM2/6/15
to unitex-...@googlegroups.com
Hi!
Oh yeah thanks for the reminder about "+" and "-" signs, I remember now why I had ruled out this possibility.

I have done what you said. My dicionaries seems to have the right format and I've tried to use only my "mediatif" dictionary. As you seemed to predict, when I do this, all the words are well annotated.
So my next step was to pair my dictionary with the three preselected ones(that's : ajouts80jours.bin which is useless for my text anyway, I could easily uncheck this one ; dela-fr-public.bin and motsGramf-.bin), one by one.
When I use the first or the second, paired with my dictionary, everything works fine, my words are recognized.
But when I pair it with the last one : motsGramf-.bin that's when "selon" and "suivant" aren't recognized anymore.

So here is my new question : is this dictionary really important? I see that with dela-fr-public.bin, I already have access to grammatical informations, do you know if I can safely stop using motsGramf-.bin ? For now my graph seems to work just like before, just adding the new recognized words, but I wouldn't have done something stupid if this dictionary contains important informations.

Thanks you very much for your help !

Denis Maurel

unread,
Feb 6, 2015, 11:41:19 AM2/6/15
to Charlotte Faure, unitex-...@googlegroups.com


Dear Charlotte,

The Gramf-.bin dictionaryhas priority (- at the end of the filename).
So a word in this one and in your dictionary is not tagged by your dictionary.

It is not important dictionary: the aim is to not tag non frequent words if they have the same writting of a very frequent one. For instance: "la" (music) versus "la" (pronoun or determiner); "par" (golf) versus "par" (preposition).

You can build a new Default dictionaries with your dictionaries and Dela-fr-public and ajouts80jours.bin


Best regards,

Denis Maurel


____________________________________
Professor Denis Maurel
Université François Rabelais Tours
LI (Computer Science Research Laboratory)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Phone: 33-2.47.36.14.35
Fax: 33-2.47.36.14.22
mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/



Hi!
--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.
Reply all
Reply to author
Forward
0 new messages