Skip to first unread message

Donghee

unread,
Jun 3, 2015, 4:31:43 AM6/3/15
to unitex-...@googlegroups.com
hello, guys!
I have a question about another language working with Unitex.
I want to work with Kazakh language that parallels to Russian cyrillic alphabet,except for 9 Kazakh alphabet.
So I added the 9 Kazakh alphabet to 'alphabet.txt' file in the russian directory as a stopgap.
But I can't get the token list containing words that have Kazakh alphabet after preprocessing a Kazakh corpus.
I can only get the one containing the words made out of the same alphabet of Kazakh and Russian languages.
Is it possible to work with Kazakh language after all?

If you can answer my question, I would really appreciate it.
Looking forward to hearing from you,

Gilles Vollant

unread,
Jun 3, 2015, 5:48:50 AM6/3/15
to Donghee, unitex-...@googlegroups.com

Hello,
If I understand, the problem is probably when you run Tokenize?

Can you enable log and preprocess and small text with the problem?

http://igm.univ-mlv.fr/~unitex/UnitexManual3.1.pdf page 254 section 13.1

regards
Gilles Vollant

-----Message d'origine-----
De : unitex-...@googlegroups.com [mailto:unitex-...@googlegroups.com] De la part de Donghee
Envoyé : mercredi 3 juin 2015 10:32
À : unitex-...@googlegroups.com
Objet : [Unitex-GramLab] about other language
--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.
To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/6e90e588-266e-4c1a-91bb-78bf97134da7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Donghee

unread,
Jun 5, 2015, 8:13:41 AM6/5/15
to unitex-...@googlegroups.com
hello!
I solved the problem thanks to you!
Thank you so much.
have a nice day:)
Message has been deleted
Message has been deleted

Donghee

unread,
Jun 10, 2015, 8:37:24 AM6/10/15
to unitex-...@googlegroups.com
Hello, 
I solved the problem concerning working with another language.
I added the additional letters(9 Kazakh letters) to 'alphabet.txt' to the folder in which to create workspace, not to the folder in which the Unitex/GramLab is installed
Actually, it was a simple problem ;)
I hope this will be helpful to those who have a similar problem
Thank you all for the help
Have a nice day!


Reply all
Reply to author
Forward
0 new messages