MeCab or ...?

626 views
Skip to first unread message

Darren Cook

unread,
Nov 8, 2013, 3:06:07 AM11/8/13
to nlp-ja...@googlegroups.com
Ping! Is this group still alive? (Is there now a better place to discuss Japanese NLP questions, in English?)

Question of the day: is MeCab is still the NLP tool of choice? (The MeCab site appears to be alive, and has a comparison with ChaSen, JUMAN and KAKASI, but the link to Chasen goes to a page last updated in 2007, the JUMAN link is a 404, and the KAKASI linked-to page was last updated in 2004, so I think that comparison table hasn't been touched in 6 years...)

Any other Japanese NLP tools that are being developed that are worth a look?

Thanks,
Darren

Michael Wayne Goodman

unread,
Nov 8, 2013, 12:47:20 PM11/8/13
to nlp-ja...@googlegroups.com
I don't have a good answer to your question, but I see that MeCab's
GoogleCode page (is this the site you mentioned?) has seen some
relatively recent activity (a commit on March 2013). I used MeCab
because it was packaged nicely for Ubuntu.
https://code.google.com/p/mecab

Also, googling "japanese morphological analyzer" turned up Kuromoji,
which seems to use the same output format as Chasen/MeCab, but is
developed separately. I don't know how the performance compares, but
it was apparently made to integrate with Lucene/Solr.
http://atilika.com/en/products/kuromoji.html

Hope that helps
> --
> You received this message because you are subscribed to the Google Groups
> "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nlp-japanese...@googlegroups.com.
> To post to this group, send email to nlp-ja...@googlegroups.com.
> Visit this group at http://groups.google.com/group/nlp-japanese.
> For more options, visit https://groups.google.com/groups/opt_out.



--
-Michael Wayne Goodman

Jim Breen

unread,
Nov 8, 2013, 5:46:44 PM11/8/13
to nlp-ja...@googlegroups.com
On 8 November 2013 19:06, Darren Cook <dar...@dcook.org> wrote:
> Ping! Is this group still alive? (Is there now a better place to discuss
> Japanese NLP questions, in English?)

It never really fired as a group/list. The little discussion in English seems to
happen on the Corpora list.

> Question of the day: is MeCab is still the NLP tool of choice? (The MeCab
> site appears to be alive, and has a comparison with ChaSen, JUMAN and
> KAKASI, but the link to Chasen goes to a page last updated in 2007, the
> JUMAN link is a 404, and the KAKASI linked-to page was last updated in 2004,
> so I think that comparison table hasn't been touched in 6 years...)

I would like to think so, since I use MeCab daily. It's been updated recently
(several times in the last 12 months.) It's also keeping in step with the latest
versions of Unidic, which I think is the lexicon of choice for mrphological
analysis.

There seems to have been a recent new version of Juman, and I have head
a comment that it's good, but I haven't explored it much.

> Any other Japanese NLP tools that are being developed that are worth a look?

Have a look at http://cl.naist.jp/~eric-n/ubuntu-nlp/ if you haven't
already. Things
like yamcha and cabocha may be useful. There are similar things in the
Juman camp.

There's also kuromoji, which has been getting interest because it's in Java and
can run under Android, etc. on phones, etc. I think it uses a compressed version
of IPADIC, which puts me off it.

Cheers

Jim
--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University

Francis Bond

unread,
Nov 10, 2013, 3:30:28 AM11/10/13
to nlp-ja...@googlegroups.com
The best list I know of is here:
http://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#LR_type_Software:morphological_analyzer

I don't know of a good up-to-date comparison.
> --
> You received this message because you are subscribed to the Google Groups "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nlp-japanese...@googlegroups.com.
> To post to this group, send email to nlp-ja...@googlegroups.com.
> Visit this group at http://groups.google.com/group/nlp-japanese.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

Darren Cook

unread,
Nov 11, 2013, 2:22:00 AM11/11/13
to nlp-ja...@googlegroups.com
Thanks Francis. That is a very useful list.

There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
kuromoji that Jim mentioned). Do you know if anyone is interested in
maintaining the English list? (I also noticed the upcoming conferences
was blank, but the Japanese side has loads.)

>> It never really fired as a group/list. The little discussion in English seems to
>> happen on the Corpora list.

Which list is that?

>> [Active alternatives to Mecab]

>> There's also kuromoji, which has been getting interest because it's in Java and
>> can run under Android, etc. on phones, etc.

I noticed KyTea ( http://www.phontron.com/kytea/ ) has had updates this
year, and papers from 2010 and 2011.

Darren

Michael Wayne Goodman

unread,
Nov 11, 2013, 3:08:31 AM11/11/13
to nlp-ja...@googlegroups.com
On Sun, Nov 10, 2013 at 11:22 PM, Darren Cook <dar...@dcook.org> wrote:
>> The best list I know of is here:
>> http://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#LR_type_Software:morphological_analyzer
>
> Thanks Francis. That is a very useful list.
>
> There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
> kuromoji that Jim mentioned). Do you know if anyone is interested in
> maintaining the English list? (I also noticed the upcoming conferences
> was blank, but the Japanese side has loads.)
>
>>> It never really fired as a group/list. The little discussion in English seems to
>>> happen on the Corpora list.
>
> Which list is that?

See here: http://gandalf.aksis.uib.no/corpora/sub.html

>>> [Active alternatives to Mecab]
>
>>> There's also kuromoji, which has been getting interest because it's in Java and
>>> can run under Android, etc. on phones, etc.
>
> I noticed KyTea ( http://www.phontron.com/kytea/ ) has had updates this
> year, and papers from 2010 and 2011.
>
> Darren
>
> --
> You received this message because you are subscribed to the Google Groups "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nlp-japanese...@googlegroups.com.
> To post to this group, send email to nlp-ja...@googlegroups.com.
> Visit this group at http://groups.google.com/group/nlp-japanese.
> For more options, visit https://groups.google.com/groups/opt_out.



--
-Michael Wayne Goodman

Francis Bond

unread,
Nov 12, 2013, 8:37:52 AM11/12/13
to nlp-ja...@googlegroups.com
G'day,

On Mon, Nov 11, 2013 at 3:22 PM, Darren Cook <dar...@dcook.org> wrote:
>> The best list I know of is here:
>> http://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#LR_type_Software:morphological_analyzer
>
> Thanks Francis. That is a very useful list.
>
> There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
> kuromoji that Jim mentioned). Do you know if anyone is interested in
> maintaining the English list? (I also noticed the upcoming conferences
> was blank, but the Japanese side has loads.)

I suspect they would jump at a volunteer and appreciate feedback.
The main list was put up and translated with some funding a couple of
years ago (when it was hosted at Kyodai). I think it has only
sporadically been updated.

>>> It never really fired as a group/list. The little discussion in English seems to
>>> happen on the Corpora list.
>

Reply all
Reply to author
Forward
0 new messages