r199 - CJKSplitter/trunk

0 views
Skip to first unread message

czug-checkins

unread,
Oct 12, 2006, 9:47:11 PM10/12/06
to czug-c...@googlegroups.com
Author: panjy
Date: Fri Oct 13 09:47:10 2006
New Revision: 199

Modified:
CJKSplitter/trunk/README.txt

Log:
change to rst


Modified: CJKSplitter/trunk/README.txt
==============================================================================
--- CJKSplitter/trunk/README.txt (original)
+++ CJKSplitter/trunk/README.txt Fri Oct 13 09:47:10 2006
@@ -1,41 +1,40 @@
CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex
+=============================================================================
+CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text
+stored as Unicode. It uses a simple, but workable, "hack" instead of trying
+to do real word splitting from dictionaries. Compared to a dictionary based
+word splitter, this results in a bigger index and more matches than necessary,
+but it is a cheap price to pay for the reduced complexity.

- CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text
- stored as Unicode. It uses a simple, but workable, "hack" instead of trying
- to do real word splitting from dictionaries. Compared to a dictionary based
- word splitter, this results in a bigger index and more matches than necessary,
- but it is a cheap price to pay for the reduced complexity.
+Features
+================
+- use regular expression to compatible with defualt English white space
+splitter

-Feature
+- much simpler code, easy to install, easy to use

- - use regular expression to compatible with defualt English white space
- splitter
+- support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5.
+provide 3 splitters(more to come):

- - much simpler code, easy to install, easy to use
+ * 'CJK splitter' : support unicode/utf-8 encoding. this encoding is
+compatible with version 0.1

- - support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5.
- provide 3 splitters(more to come):
+ * 'CJK GB splitter' : support unicode/gb18030/gbk/gb2312/mbcs encodings.

- * 'CJK splitter' : support unicode/utf-8 encoding. this encoding is
- compatible with version 0.1
+ * 'CJK BIG5 splitter' : support unicode/big5/mbcs encodings

- * 'CJK GB splitter' : support unicode/gb18030/gbk/gb2312/mbcs encodings.
+- smaller index storage for CJK: index stored as unicode(2 byts) but not
+utf-8(3 bytes)

- * 'CJK BIG5 splitter' : support unicode/big5/mbcs encodings
+- support english globing

- - smaller index storage for CJK: index stored as unicode(2 byts) but not
- utf-8(3 bytes)
+- support single Chinese charactor search

- - support english globing
+About ZOpen
+=================
+ZOpen is a professional Zope/Plone consulting company located in Shanghai,
+China. We are also the supporter for CZUG.org (China Zope User Group).
+We are trying to make Zope/CMF/Plone works for the Chinese people.

- - support single Chinese charactor search
-
-About ZopeChina
-
- ZopeChina.com is a leading ZSP(Zope Service Provider) in China. We are also
-the supporter for CZUG.org (China Zope User Group). We are trying to make
-Zope/CMF/Plone works for the Chinese people. We wish all the Chinese Zope guys
-can be together and make zope works better for Chinese:)
-
- Contact us with : pan_j...@yahoo.com.cn
+Contact us with : pa...@zopen.cn

Reply all
Reply to author
Forward
0 new messages