将 Google 拼音输入法词库转换为 Vimim 词库脚本

53 views
Skip to first unread message

Wenbo Yang

unread,
May 21, 2009, 11:46:59 AM5/21/09
to vi...@googlegroups.com
Hi All,

我写了一个将 Google 拼音输入法词库转换为 Vimim 词库的脚本,贴在这里,希望对大家有用。

使用方法:
1. 在 Google 拼音输入法“属性设置->词典”选项页,将 Google 输入法词库导出为 .dic 文件,例如 google.dic。
2. 将 google.dic 拷贝到 Linux 中,或者使用 Cygwin,进入到包含 google.dic 的目录。
3. 下载本邮件附件 google2vimim,给它增加可执行权限 chmod u+x google2vimim。
4. ./google2vimim google.dic > vimim.pinyin.txt,得到的 vimim.pinyin.txt 就是符合 vimim 规范的词库。

如果您发现该脚本有任何问题,可以联系我对其加以修改。

文博
--
Wenbo YANG

The State Key Laboratory Of Information Security
Graduate University of Chinese Academy of Sciences
19A Yuquan Road, Beijing, China --- Homepage: http://solrex.cn
google2vimim

vimim

unread,
May 21, 2009, 4:13:35 PM5/21/09
to vimim
That is cool.

If it can be uploaded somewhere, like I did for vimim.pinyin.txt, it
would be great! Again, I am not sure if Google like it or not.

I don't have google.dic. By looking at the code, I am wondering if it
is valid to change ^M to \r to make it portable? (I don't like ^M,
but it might be required somehow)

iconv -f gbk -t utf-8 "$@" | sed -e 's/ //g;s/\r$//g' | awk 'NR==1 {a=
$3; printf "%s %s",$3,$1; next; }{ if($3==a) printf " %s",$1;else
printf "\n%s %s",$3,$1; a=$3;}' | sort -d

Thanks

> google2vimim
> < 1KViewDownload

vimim

unread,
May 21, 2009, 4:25:05 PM5/21/09
to vimim
Wait ...

It seems possible to "port" the code to Vim script?

If it does, then I can include it in vimim.vim, and add one more
function.
Then, I don't need to upload/maintain pinyin datafile anymore!!!

The datafile is not supposed to be part of vimim.vim, in my opinion.

Thanks

Wenbo Yang

unread,
May 21, 2009, 8:50:39 PM5/21/09
to vi...@googlegroups.com
The problems are 1) Google Pinyin datafile is a "user phrase dictionary", so it does not include all words (especially some rarely used ones) as original vimim.pinyin.txt did. 2) Not all people use Google Pinyin IM.

For most users who stuck to some traditional IM such as Zhineng ABC and Microsoft Pinyin, a datafile is an unavoidable need.

Regards,
Wenbo
2009/5/22 vimim <maxian...@gmail.com>


Wait ...

It seems possible to "port" the code to Vim script?

If it does, then I can include it in vimim.vim, and add one more
function.
Then, I don't need to upload/maintain pinyin datafile anymore!!!

The datafile is not supposed to be part of vimim.vim, in my opinion.

Wenbo Yang

unread,
May 21, 2009, 8:58:24 PM5/21/09
to vi...@googlegroups.com

2009/5/22 vimim <maxian...@gmail.com>

That is cool.

If it can be uploaded somewhere, like I did for vimim.pinyin.txt, it
would be great! Again, I am not sure if Google like it or not.

I uploaded it to my site @ http://share.solrex.cn/scripts/google2vimim . We are converting user phrase dictionary, I think there is no business with Google.
 
I don't have google.dic. By looking at the code, I am wondering if it
is valid to change ^M to \r to make it portable?  (I don't like ^M,
but it might be required somehow)

Yes, I missed that.
 
iconv -f gbk -t utf-8 "$@" | sed -e 's/ //g;s/\r$//g' | awk 'NR==1 {a=
$3; printf "%s %s",$3,$1; next; }{ if($3==a) printf " %s",$1;else
printf "\n%s %s",$3,$1; a=$3;}' | sort  -d
On May 21, 8:46 am, Wenbo Yang <sol...@gmail.com> wrote:
> Hi All,
>
> 我写了一个将 Google 拼音输入法词库转换为 Vimim 词库的脚本,贴在这里,希望对大家有用。


Regards,
Wenbo

vimim

unread,
May 21, 2009, 11:09:36 PM5/21/09
to vimim
Does it helpful if you use "\r\n" instead of one "\n" in case DOS file
is used?
I am not sure it can be worked or not. Only test can tell though.

On May 21, 5:58 pm, Wenbo Yang <sol...@gmail.com> wrote:
> 2009/5/22 vimim <maxiangji...@gmail.com>
>
> > That is cool.
>
> > If it can be uploaded somewhere, like I did for vimim.pinyin.txt, it
> > would be great! Again, I am not sure if Google like it or not.
>
> I uploaded it to my site @http://share.solrex.cn/scripts/google2vimim. We

wang feng

unread,
May 21, 2009, 11:29:06 PM5/21/09
to vi...@googlegroups.com
could you please send the vimim.pinyin.txt file as attachment to me? I
do not have a M$ Windows installed on my laptop.
thanks in advance.

vimim

unread,
May 22, 2009, 12:38:29 AM5/22/09
to vimim
VimIM 码表样例
音码 码表样本
拼音 http://maxiangjiang.googlepages.com/vimim.pinyin.txt

Please tell me if you cannot download from there.

wang feng

unread,
May 22, 2009, 2:30:32 AM5/22/09
to vi...@googlegroups.com
呵呵,我是指他转换过后google输入法的那个文件;
这个文件我可以下载的,谢谢啦:)

Wenbo Yang

unread,
May 22, 2009, 6:45:31 AM5/22/09
to vi...@googlegroups.com
Sharing my phrase dictionary will have some risk of leaking private information. I am sorry.

2009/5/22 wang feng <wanng...@gmail.com>
呵呵,我是指他转换过后google输入法的那个文件;
这个文件我可以下载的,谢谢啦:)

Regards,
Wenbo

wang feng

unread,
May 22, 2009, 7:11:11 AM5/22/09
to vi...@googlegroups.com
是我误会了,我以为你可以将google拼音的官方词库导出,不知道导出的是你自己
的词库,不好意思。

Wenbo Yang wrote:
> Sharing my phrase dictionary will have some risk of leaking private
> information. I am sorry.
>

> 2009/5/22 wang feng <wanng...@gmail.com <mailto:wanng...@gmail.com>>

Reply all
Reply to author
Forward
0 new messages