Languages without delimiters

30 views
Skip to first unread message

purva mhasakar

unread,
Dec 24, 2016, 6:51:50 AM12/24/16
to Unitex-GramLab
Hello,

Some languages are written without delimiters such as Japanese and Chinese. I wanted to ask that will how Unitex segment such languages into elementary units? So that the words are separated for further analysis.

Regards,
Btech first year student,
India.

eric.laporte

unread,
Jan 6, 2017, 11:21:47 AM1/6/17
to Unitex-GramLab
Dear Purva Mhasakar,
For languages without spaces, Unitex/GramLab recognizes elementary units in two steps. First, it tokenizes the text on a character by character basis (Manual, Section 2.5.4). Then, when you apply dictionaries, words are recognized; in case of word-segmentation ambiguity, all solutions are represented in parallel.
Best,
Eric Laporte


Reply all
Reply to author
Forward
0 new messages