Compared against Sen / GoSen?

19 views
Skip to first unread message

Franz See

unread,
Jan 4, 2010, 7:03:16 PM1/4/10
to cmecab-java-users
Good day,

I have just stumbled upon cmecab-java (from http://hide-t.vox.com/library/post/gosen.html)
and I've read a bit about MeCab. Im not sure though if I understood it
though.

How does it differ from Sen or GoSen?

Thanks,
Franz

PS: If replies can be in english, that would be greatly
appreciated :-) If not, that's ok. I can still use google translate.
hehe :-)

Kohei TAKETA

unread,
Jan 5, 2010, 9:02:08 AM1/5/10
to cmecab-j...@googlegroups.com
Hello and Happy New Year!

Franz See <fran...@gmail.com>:


> How does it differ from Sen or GoSen?

MeCab, Sen, and GoSen are all morphological analyzers for Japanese language.
Sen is a Java port of MeCab (which is written in C++) and GoSen is a
rewrite of Sen.
Unlike Sen and GoSen, cmecab-java is not a port of MeCab. It is a JNI
wrapper for MeCab.

Because Sen, GoSen and cmecab-java are all derivatives of MeCab, there
is basically
no functional difference between them. However, cmecab-java has some
auxiliary features
you may be interested in:

* Speed. cmecab-java is a bit faster than Sen and GoSen.
* High affinity with Apache Lucene and Solr. cmecab-java comes with
analyzers/tokenizers
for Lucene and Solr.
* Ease of dictionary maintenance. You can use standard dictionary
maintenance tools
which come with MeCab. Sen and GoSen use a dictionary file format
that is incompatible
with MeCab's, so you can't use standard tools with them.

Regards,
Kohei Taketa

Franz Allan Valencia See

unread,
Jan 5, 2010, 10:53:00 AM1/5/10
to cmecab-j...@googlegroups.com
Good day to you Kohei TAKETA, and a Happy New Year as well :-)

Very good explanation! It's much clearer to me now :-)

With the list of auxiliary features, I find the dictionary maintenance very attractive. I have a couple of questions though:
* Where can I get more information regarding cmecab-java's dictionary maintenance utilities?, and
* Where can I get information on how to migrate my existing GoSen dictionary to cmecab-java?

Thanks,

--
Franz Allan Valencia See | Java Software Engineer
fran...@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see

2010/1/5 Kohei TAKETA <taks...@gmail.com>

--

このメールは Google グループのグループ「cmecab-java-users」の登録者に送られています。
このグループに投稿するには、cmecab-j...@googlegroups.com にメールを送信してください。
このグループから退会するには、cmecab-java-us...@googlegroups.com にメールを送信してください。
詳細については、http://groups.google.com/group/cmecab-java-users?hl=ja からこのグループにアクセスしてください。



Kohei TAKETA

unread,
Jan 6, 2010, 6:55:42 AM1/6/10
to cmecab-j...@googlegroups.com
Hello Franz,

Franz Allan Valencia See <fran...@gmail.com>:


> * Where can I get more information regarding cmecab-java's dictionary
> maintenance utilities?

You can use MeCab's dictionary maintenance utilities because
cmecab-java is a JNI binding of MeCab.
Below is the web page that explains how to add words to MeCab's global
dictionary or user dictionary:
http://mecab.sourceforge.net/dic.html

> * Where can I get information on how to migrate my existing GoSen dictionary
> to cmecab-java?

I'm sorry but I don't know how to migrate compiled GoSen dictionary to MeCab.
But if you don't currently use a user dictionary, you don't need to
migrate your GoSen dictionary to MeCab.
MeCab comes with a dictionary that is the newer version of GoSen's.

Regards,
Kohei Taketa

Franz Allan Valencia See

unread,
Jan 6, 2010, 7:23:09 AM1/6/10
to cmecab-j...@googlegroups.com
Re adding words to MeCab's global dictionary:
Thanks! I'll take a look at it.

Re Migrating GoSen's dictionary to MeCab's dictionary:
I see.

But what do you mean by 'user dictionary'? Is that the dictionary.csv file that my GoSen used to produce its binary dictionary? - if so, then yes I still have that dictionary.csv.

As to using my GoSen's dictionary.csv directly to MeCab - ok, I'll test it out. Hopefully it does work eventhough my GoSen dictionary.csv lacks a few columns.

From my GoSen dictionary.csv
阪急交通社 3198 名詞 固有名詞 組織 * * * 阪急交通社 ハンキュウコウツウシャ ハンキューコーツーシャ

From MeCab-ipadic Noun.org.csv
阪急交通社 1292 1292 6849 名詞 固有名詞 組織 * * * 阪急交通社 ハンキュウコウツウシャ ハンキューコーツーシャ

Thanks,
--
Franz Allan Valencia See | Java Software Engineer
fran...@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see

2010/1/6 Kohei TAKETA <taks...@gmail.com>

Kohei TAKETA

unread,
Jan 7, 2010, 8:19:28 AM1/7/10
to cmecab-j...@googlegroups.com
Hello,

Franz Allan Valencia See <fran...@gmail.com>:
But what do you mean by 'user dictionary'? Is that the dictionary.csv file that my GoSen used to produce its binary dictionary? - if so, then yes I still have that dictionary.csv.

dictionary.csv is the source file for GoSen's global dictionary. GoSen can have a custom dictionary separate from the global dictionary.
If you don't have any csv file other than dictionary.csv, then don't worry, you don't have a user dictionary.
 

As to using my GoSen's dictionary.csv directly to MeCab - ok, I'll test it out. Hopefully it does work eventhough my GoSen dictionary.csv lacks a few columns.

This is because GoSen uses an older version of ipadic dictionary. There was a format change between the versions.
Reply all
Reply to author
Forward
0 new messages