MeCab or ...?

Skip to first unread message

Darren Cook

Nov 8, 2013, 3:06:07 AM11/8/13
Ping! Is this group still alive? (Is there now a better place to discuss Japanese NLP questions, in English?)

Question of the day: is MeCab is still the NLP tool of choice? (The MeCab site appears to be alive, and has a comparison with ChaSen, JUMAN and KAKASI, but the link to Chasen goes to a page last updated in 2007, the JUMAN link is a 404, and the KAKASI linked-to page was last updated in 2004, so I think that comparison table hasn't been touched in 6 years...)

Any other Japanese NLP tools that are being developed that are worth a look?


Michael Wayne Goodman

Nov 8, 2013, 12:47:20 PM11/8/13
I don't have a good answer to your question, but I see that MeCab's
GoogleCode page (is this the site you mentioned?) has seen some
relatively recent activity (a commit on March 2013). I used MeCab
because it was packaged nicely for Ubuntu.

Also, googling "japanese morphological analyzer" turned up Kuromoji,
which seems to use the same output format as Chasen/MeCab, but is
developed separately. I don't know how the performance compares, but
it was apparently made to integrate with Lucene/Solr.

Hope that helps
> --
> You received this message because you are subscribed to the Google Groups
> "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> To post to this group, send email to
> Visit this group at
> For more options, visit

-Michael Wayne Goodman

Jim Breen

Nov 8, 2013, 5:46:44 PM11/8/13
On 8 November 2013 19:06, Darren Cook <> wrote:
> Ping! Is this group still alive? (Is there now a better place to discuss
> Japanese NLP questions, in English?)

It never really fired as a group/list. The little discussion in English seems to
happen on the Corpora list.

> Question of the day: is MeCab is still the NLP tool of choice? (The MeCab
> site appears to be alive, and has a comparison with ChaSen, JUMAN and
> KAKASI, but the link to Chasen goes to a page last updated in 2007, the
> JUMAN link is a 404, and the KAKASI linked-to page was last updated in 2004,
> so I think that comparison table hasn't been touched in 6 years...)

I would like to think so, since I use MeCab daily. It's been updated recently
(several times in the last 12 months.) It's also keeping in step with the latest
versions of Unidic, which I think is the lexicon of choice for mrphological

There seems to have been a recent new version of Juman, and I have head
a comment that it's good, but I haven't explored it much.

> Any other Japanese NLP tools that are being developed that are worth a look?

Have a look at if you haven't
already. Things
like yamcha and cabocha may be useful. There are similar things in the
Juman camp.

There's also kuromoji, which has been getting interest because it's in Java and
can run under Android, etc. on phones, etc. I think it uses a compressed version
of IPADIC, which puts me off it.


Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University

Francis Bond

Nov 10, 2013, 3:30:28 AM11/10/13
The best list I know of is here:

I don't know of a good up-to-date comparison.
> --
> You received this message because you are subscribed to the Google Groups "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To post to this group, send email to
> Visit this group at
> For more options, visit

Francis Bond <>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

Darren Cook

Nov 11, 2013, 2:22:00 AM11/11/13
Thanks Francis. That is a very useful list.

There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
kuromoji that Jim mentioned). Do you know if anyone is interested in
maintaining the English list? (I also noticed the upcoming conferences
was blank, but the Japanese side has loads.)

>> It never really fired as a group/list. The little discussion in English seems to
>> happen on the Corpora list.

Which list is that?

>> [Active alternatives to Mecab]

>> There's also kuromoji, which has been getting interest because it's in Java and
>> can run under Android, etc. on phones, etc.

I noticed KyTea ( ) has had updates this
year, and papers from 2010 and 2011.


Michael Wayne Goodman

Nov 11, 2013, 3:08:31 AM11/11/13
On Sun, Nov 10, 2013 at 11:22 PM, Darren Cook <> wrote:
>> The best list I know of is here:
> Thanks Francis. That is a very useful list.
> There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
> kuromoji that Jim mentioned). Do you know if anyone is interested in
> maintaining the English list? (I also noticed the upcoming conferences
> was blank, but the Japanese side has loads.)
>>> It never really fired as a group/list. The little discussion in English seems to
>>> happen on the Corpora list.
> Which list is that?

See here:

>>> [Active alternatives to Mecab]
>>> There's also kuromoji, which has been getting interest because it's in Java and
>>> can run under Android, etc. on phones, etc.
> I noticed KyTea ( ) has had updates this
> year, and papers from 2010 and 2011.
> Darren
> --
> You received this message because you are subscribed to the Google Groups "nlp-Japanese" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To post to this group, send email to
> Visit this group at
> For more options, visit

-Michael Wayne Goodman

Francis Bond

Nov 12, 2013, 8:37:52 AM11/12/13

On Mon, Nov 11, 2013 at 3:22 PM, Darren Cook <> wrote:
>> The best list I know of is here:
> Thanks Francis. That is a very useful list.
> There were some bad links (e.g. JUMAN), and some stuff is missing (e.g.
> kuromoji that Jim mentioned). Do you know if anyone is interested in
> maintaining the English list? (I also noticed the upcoming conferences
> was blank, but the Japanese side has loads.)

I suspect they would jump at a volunteer and appreciate feedback.
The main list was put up and translated with some funding a couple of
years ago (when it was hosted at Kyodai). I think it has only
sporadically been updated.

>>> It never really fired as a group/list. The little discussion in English seems to
>>> happen on the Corpora list.

Reply all
Reply to author
0 new messages