any chance to port to the latest Sphinx 0.9.10 teaser version

3 views
Skip to first unread message

colorpyen

unread,
Jul 19, 2009, 8:23:48 PM7/19/09
to sphinx-for-chinese
The latest version has string attribute support which is quite useful.
I am just curious when the sphinx for chinese plans to move forward.
Also what's the difference between this project and crokseek.

Thanks a lot

Zhuguo Shi

unread,
Jul 19, 2009, 9:18:36 PM7/19/09
to sphinx-fo...@googlegroups.com
Hi colorpyen,

There is some plan to update sphinx-for-chinese to the latest 0.9.10 version but I am afraid it won't be soon due to some personal reasons (I really have no time for the time being). But I think if you have some programming experiences it will not be difficult to make the update according to current version of sphinx-for-chinese. You can diff the source code with original sphinxsearch, and you'll see the modification is not too much.

Coreseek is one of the modified version of sphinxsearch, and sphinx-for-chinese is another one too. Both of them are aimed to provide more efficient way to index and search Chinese and they are using the same Chinese segmentation algorithm (MMSEG). But sphinx-for-chinese is focused on the built-in segmentation implementation and doesn't provide the python data source compared to Coreseek. So in my opinion sphinx-for-chinese is much faster than coreseek and the way of segmentation is much more customizable (you can add any characters to the dictionary).

I will update sphinx-for-chinese as soon as possible and don't hesitate to mail me if you have any questions.

Thanks~

Peter Yen

unread,
Jul 19, 2009, 10:32:55 PM7/19/09
to sphinx-fo...@googlegroups.com
Hi Zhuguo,

Thanks for your prompt response. I can definitely take a look at the diff and try to incorporate your patch.

Also, I am curious about the support for traditional chinese, is it just a matter of replacing dictionary for segmentation or there is other involved?

Thanks a lot.

-peter

Zhuguo Shi

unread,
Jul 19, 2009, 11:01:08 PM7/19/09
to sphinx-fo...@googlegroups.com
Hi, Peter,

Though I have not tested traditional Chinese, but since the segmentation algorithm in sphinx-for-chinese is based on UTF-8 encoded string (only UTF-8 encoding is supported so far), there should not be any problems when dealing with traditional Chinese. Just replace the dictionary and have a try.

Thanks~
Reply all
Reply to author
Forward
0 new messages