Available in Unitex: the Arabic broken plural test set.

30 views
Skip to first unread message

Alexis Neme

unread,
Feb 14, 2014, 10:19:42 PM2/14/14
to unitex-...@googlegroups.com
Dear Members,

I am pleased to announce that the evaluation test set for modelling the broken plural (cf. Neme, Laporte, 2013)  is now available and distributed with Unitex since February 5th 2014.

We selected three documents totaling 3 550 tokens (about 10 pages) and containing scientific popularization about three topics: pollution and fishing in Egypt, earthquakes in the world, and quality of water. During the construction of our lexicon of BPs, we did not use any part of the corpus: our sources of information were handbooks, reference dictionaries and native speaker competence. Thus, the evaluation tool is independent from the evaluated resource. 

- To access the annotated corpus, open Fishing-Earthquakes-Water.snt in the Unitex/Arabic/Corpus ..... 
- It has been preprocessed by Unitex with the PRIM dictionary of broken plurals (Neme, Laporte, 2013) and Neme's (2011) dictionary of verbs. 
- The 267 broken plurals with/without agglutinations can be obtained and listed in a concordance. 
- For more detail,  see readme_Fishing-Earthquakes-Water.txt in sub-directory Arabic\corpus)


Cheers, 

Alexis

Reply all
Reply to author
Forward
0 new messages