Serbian resource update

28 views
Skip to first unread message

Cvetana Krstev

unread,
Aug 20, 2015, 2:50:33 PM8/20/15
to Unitex-GramLab
Hi,
The Unitex Serbian resources have been updated. This is version 2.0. The authors of the dictionaries, Dusko Vitas and myself, have extended the dictionary of simple words to 88,753 lemmas and the dictionary of multiword expressions to 10,288 lemmas. We also provided a dictionary-graph for recognition and normalization of multiword numerals, and preprocessing
graphs (one for sentence boundaries and one for normalization at tokenization time). The new version is online. More about dictionaries you can find in:
  1. Duško Vitas and Cvetana Krstev. “Processing of Corpora of Serbian Using Electronic Dictionaries”. in Prace Filologiczne, vol. LXIII,   pp. 279-292, Warszawa, 2012. ISSN 0138-0567 PDF
  2. Cvetana KrstevProcessing of Serbian – Automata, Texts and Electronic dictionaries Faculty of Philology, University of Belgrade, Belgrade, 2008. (a copy on request)
You can find more bibliographical references with full text at http://poincare.matf.bg.ac.rs/~cvetana/CV_Bibl_nova.html

Cvetana Krstev
Faculty of Philology
University of Belgrade
Reply all
Reply to author
Forward
0 new messages