Hi,
The Unitex Serbian resources have been updated. This is version 2.0. The authors of the dictionaries, Dusko Vitas and myself, have extended the dictionary of simple words to 88,753 lemmas and the dictionary of multiword expressions to 10,288 lemmas. We also provided a dictionary-graph for recognition and normalization of multiword numerals, and preprocessing
graphs (one for sentence boundaries and one for normalization at tokenization time). The new version is online. More about dictionaries you can find in:
- Duško Vitas and Cvetana Krstev. “Processing of Corpora of Serbian Using Electronic Dictionaries”. in Prace Filologiczne, vol. LXIII, pp. 279-292, Warszawa, 2012. ISSN 0138-0567 PDF
- Cvetana Krstev, Processing of Serbian – Automata, Texts and Electronic dictionaries Faculty of Philology, University of Belgrade, Belgrade, 2008. (a copy on request)
Cvetana Krstev
Faculty of Philology
University of Belgrade