(hi Cristian, hi Eric)
I am also interested to annotated Arabic text with the text-automaton tool in Unitex.
As I see jeesun's main critics to the tool are related to visualisation and ergonomic.
horizontally: almost 8 words/segments fit in the automaton-window, => not enough context for the annotators, only 4 words in Arabic if there is agglutinated segments
vertically: 6 ambiguities fit vertically in the automaton_window => not enough to select the right ambiguity:
Here two suggestions for the IDE_JAVA team, expressed in a SCRUM stories (new features description)
SCRUM Story for automaton windows:
As a corpus annotator, I should be able to increase the H-size and the V-size of the automaton windows;
so the annotator will have more confort in the automaton windows.
( for example: F11 will visualise only the automaton window in full screen and hide the top window-sentence)
SCRUM Story for ergonomic windows:
As a corpus annotator, I should be able to use mouse-and-keyboard_shortcut to scroll or access any functionality for annotation, so the annotator will have more confort in the handling the automaton visualisation and the related menu for annotations.
( even if the fokus is not in the sentence counter:
arrow-up/down must move to previous/next sentence;
CTRL-arrow-up/down must move to 10th previous/next sentence;
arrow-Left/Right must slide the window 4 words in the same sentence;
CTRL-arrow-Left/Right must slide the window 8 words in the same sentence; etc ;
The IDE-java developers (Maxim-Marvin) are able to propose better choices (visualisation/ergonomic) since they experiment this topic more than us (linguists and annotators).
(thanks Maxim and Marvin, for this find_replace box, it is very useful)
It is a fact that Unitex has a lot of good tools for lexicon (lately Leximir) and for grammar formalization (excellent graph tools);
But, little effort has been spent to create and develop a good annotation tool in Unitex (see Treebank from LDC ).
In order to meet the expectation of linguists and annotators, there is a lot to do in Unitex .
finally, an annotated disambiguated corpus is an essential and critical resource for statistical approaches compatible with the tagset used in Unitex annotations.
Kind regards
Alexis