Languages without delimiters

32 views

Skip to first unread message

purva mhasakar

unread,

Dec 24, 2016, 6:51:50 AM12/24/16

to Unitex-GramLab

Hello,

Some languages are written without delimiters such as Japanese and Chinese. I wanted to ask that will how Unitex segment such languages into elementary units? So that the words are separated for further analysis.

Regards,

Btech first year student,

India.

eric.laporte

unread,

Jan 6, 2017, 11:21:47 AM1/6/17

to Unitex-GramLab

Dear Purva Mhasakar,

For languages without spaces, Unitex/GramLab recognizes elementary units in two steps. First, it tokenizes the text on a character by character basis (Manual, Section 2.5.4). Then, when you apply dictionaries, words are recognized; in case of word-segmentation ambiguity, all solutions are represented in parallel.

Best,

Eric Laporte

Reply all

Reply to author

Forward

0 new messages