Japanese delimiter

348 views
Skip to first unread message

Haidee Thomson

unread,
Feb 23, 2015, 9:01:11 PM2/23/15
to ant...@googlegroups.com
Hello again,

I have a question regarding using antconc for analysing Japanese texts. I have read online that because there are no spaces between words in Japanese that we need to add them so that the program can recognise where a word begins and ends. Do we need to manually insert spaces between the words in our Japanese texts or is there an easier way?

Many thanks,
Haidee

Bill Marcellino

unread,
Feb 23, 2015, 9:55:39 PM2/23/15
to ant...@googlegroups.com
Hi Haidee, 

  I think the issue would not just be spaces, but where to put spaces.  In English, we surround words with spaces or punctuation+space, so it is easy for a problem to turn the words in a text file into tokens.  But in character based languages, it's not always 1 character per word.  Characters build into words contextually, something a human reader is good at, but a computer a little less so.  I think you will need to tokenize your text first, and then use AntConc.  That might be a challenge.

  -Bill

Laurence Anthony

unread,
Feb 24, 2015, 5:38:13 AM2/24/15
to ant...@googlegroups.com
Hi,

I have built a fairly simple, but certainly usable, segmenter (that's the term for splitting character based languages into 'words') on my website. It's called SegmentAnt.

I hope that helps.

Laurence.


--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages