I have just stumbled upon cmecab-java (from http://hide-t.vox.com/library/post/gosen.html)
and I've read a bit about MeCab. Im not sure though if I understood it
though.
How does it differ from Sen or GoSen?
Thanks,
Franz
PS: If replies can be in english, that would be greatly
appreciated :-) If not, that's ok. I can still use google translate.
hehe :-)
Franz See <fran...@gmail.com>:
> How does it differ from Sen or GoSen?
MeCab, Sen, and GoSen are all morphological analyzers for Japanese language.
Sen is a Java port of MeCab (which is written in C++) and GoSen is a
rewrite of Sen.
Unlike Sen and GoSen, cmecab-java is not a port of MeCab. It is a JNI
wrapper for MeCab.
Because Sen, GoSen and cmecab-java are all derivatives of MeCab, there
is basically
no functional difference between them. However, cmecab-java has some
auxiliary features
you may be interested in:
* Speed. cmecab-java is a bit faster than Sen and GoSen.
* High affinity with Apache Lucene and Solr. cmecab-java comes with
analyzers/tokenizers
for Lucene and Solr.
* Ease of dictionary maintenance. You can use standard dictionary
maintenance tools
which come with MeCab. Sen and GoSen use a dictionary file format
that is incompatible
with MeCab's, so you can't use standard tools with them.
Regards,
Kohei Taketa
--
このメールは Google グループのグループ「cmecab-java-users」の登録者に送られています。
このグループに投稿するには、cmecab-j...@googlegroups.com にメールを送信してください。
このグループから退会するには、cmecab-java-us...@googlegroups.com にメールを送信してください。
詳細については、http://groups.google.com/group/cmecab-java-users?hl=ja からこのグループにアクセスしてください。
Franz Allan Valencia See <fran...@gmail.com>:
> * Where can I get more information regarding cmecab-java's dictionary
> maintenance utilities?
You can use MeCab's dictionary maintenance utilities because
cmecab-java is a JNI binding of MeCab.
Below is the web page that explains how to add words to MeCab's global
dictionary or user dictionary:
http://mecab.sourceforge.net/dic.html
> * Where can I get information on how to migrate my existing GoSen dictionary
> to cmecab-java?
I'm sorry but I don't know how to migrate compiled GoSen dictionary to MeCab.
But if you don't currently use a user dictionary, you don't need to
migrate your GoSen dictionary to MeCab.
MeCab comes with a dictionary that is the newer version of GoSen's.
Regards,
Kohei Taketa
| 阪急交通社 | 3198 | 名詞 | 固有名詞 | 組織 | * | * | * | 阪急交通社 | ハンキュウコウツウシャ | ハンキューコーツーシャ |
| 阪急交通社 | 1292 | 1292 | 6849 | 名詞 | 固有名詞 | 組織 | * | * | * | 阪急交通社 | ハンキュウコウツウシャ | ハンキューコーツーシャ |
But what do you mean by 'user dictionary'? Is that the dictionary.csv file that my GoSen used to produce its binary dictionary? - if so, then yes I still have that dictionary.csv.
As to using my GoSen's dictionary.csv directly to MeCab - ok, I'll test it out. Hopefully it does work eventhough my GoSen dictionary.csv lacks a few columns.