r180 - CJKSplitter/trunk

0 views
Skip to first unread message

czug-checkins

unread,
Feb 15, 2006, 1:33:18 AM2/15/06
to czug-c...@googlegroups.com
Author: panjy
Date: Wed Feb 15 14:30:37 2006
New Revision: 180

Added:
CJKSplitter/trunk/tests.py (contents, props changed)
Modified:
CJKSplitter/trunk/CJKSplitter.py
CJKSplitter/trunk/HISTORY.txt
Log:
让模糊查询和非模糊查询的分词结果相同, 更好的进行匹配


Modified: CJKSplitter/trunk/CJKSplitter.py
==============================================================================
--- CJKSplitter/trunk/CJKSplitter.py (original)
+++ CJKSplitter/trunk/CJKSplitter.py Wed Feb 15 14:30:37 2006
@@ -98,9 +98,7 @@
# result.append(w[i-1:i+1])
i += 1

- # add the last word to the catalog
- if not isGlob:
- result.append(w[-1])
+ result.append(w[-1])
else:
result.append(w)
# return [word.encode('utf8') for word in result]

Modified: CJKSplitter/trunk/HISTORY.txt
==============================================================================
--- CJKSplitter/trunk/HISTORY.txt (original)
+++ CJKSplitter/trunk/HISTORY.txt Wed Feb 15 14:30:37 2006
@@ -1,3 +1,6 @@
+
+ - make the glob split result the same as non-glob split
+
v0.7.3 2005-6-15

- fixed a bug which may cause multi-Chinese Character search fail.

Added: CJKSplitter/trunk/tests.py
==============================================================================
--- (empty file)
+++ CJKSplitter/trunk/tests.py Wed Feb 15 14:30:37 2006
@@ -0,0 +1,15 @@
+from CJKSplitter import CJKSplitter
+
+words = ['知识库-招聘资料']
+for word in words:
+ print '=====now test:', word
+ u = unicode(word, 'utf8').encode('utf8')
+ s = CJKSplitter()
+ print 'no glob result:'
+ for i in s.process([u]):
+ print i.encode('utf8')
+
+ print 'glob result:'
+ for i in s.process([u], 1):
+ print i.encode('utf8')
+

Reply all
Reply to author
Forward
0 new messages