zebra 搜尋中文 需要使用 ICU

686 views
Skip to first unread message

Thomas

unread,
May 10, 2012, 10:19:38 PM5/10/12
to Koha 臺灣
因為預設並沒有使用 ICU, 變成英文搜尋正常,但是中文會回報全部中文結果的現象,可以參考以下作法:

解決作法:

apt-get install yaz-icu libicu-dev libicu48

ps: libicu-dev libicu48 // 應該已經有裝...

cp /usr/share/idzebra-2.0/tab/{icu.idx,phrases-
icu.xml,string.chr,words-icu.xml} /etc/koha/zebradb/etc/

cp /etc/koha/zebradb/etc/words-icu.xml /etc/koha/zebradb/etc/icu.xml


編輯 /etc/koha/zebradb/etc/icu.xml
<icu_chain locale="zh_TW.UTF-8">

sudo chown -R koha:koha /etc/koha/zebradb/etc //不一定是這樣,重點權限要對

編輯 /etc/koha/zebradb/zebra-biblios.cfg
增加一行 index: icu.idx

重新啟動
sudo /etc/init.d/koha-zebra-daemon restart

Rebuild zebra index
sudo KOHA_CONF=/etc/koha/koha-conf.xml PERL5LIB=/usr/share/koha/lib
/usr/share/koha/bin/migration_tools/rebuild_zebra.pl -b -r -v

再去檢索看看應該就正常了!//當然koha 設定要改成使用zebra...

完整書名、部份書名 也就是子字串搜尋的結果都正常!

reference
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8065
http://lists.katipo.co.nz/public/koha/2011-January/027171.html
http://lists.katipo.co.nz/public/koha/2012-January/031714.html

BR, Thomas.

Thomas

unread,
Apr 2, 2014, 2:19:37 AM4/2/14
to kohat...@googlegroups.com
koha 3.14 請另外參考
http://wiki.koha-community.org/wiki/Correcting_Search_of_Arabic_records

1. 安裝yaz-icu

2. 設定 more> Administration > Global system preferences > Searching.

2.1 UseICU 設定為 use

2.2 QueryFuzzy 設定為 don't try.

2.3 QueryStemming 設定為 don't try.

修改/etc/koha/zebradb/etc/default.idx 像範例一樣 主要是加上

 icuchain words-icu.xml

exanple:

 # Traditional word index
 # Used if completenss is 'incomplete field' (@attr 6=1) and
 # structure is word/phrase/word-list/free-form-text/document-text
 index w
 completeness 0
 position 1
 alwaysmatches 1
 firstinfield 1
 icuchain words-icu.xml
 
 
 # Phrase index
 # Used if completeness is 'complete {sub}field' (@attr 6=2, @attr 6=1)
 # and structure is word/phrase/word-list/free-form-text/document-text
 index p
 completeness 1
 firstinfield 1
 icuchain words-icu.xml 

再修改/etc/koha/zebradb/etc/words-icu.xml

但是 words-icu.xml 可以簡化如下

<icu_chain locale="zh_TW.UTF-8">
  <transliterate rule="\'>\ "/>
  <transliterate rule="[:Number:] { '-' > '' "/>
  <transform rule="[:Control:] Any-Remove"/>
  <tokenize rule="l"/>
  <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
  <transform rule="NFD"/>
  <transform rule="[:Nonspacing Mark:] Remove"/>
  <transform rule="NFC"/>
  <display/>
  <casemap rule="l"/>
</icu_chain>

重新啟動zebra
再重建索引就會看到成果!

Thomas於 2012年5月11日星期五UTC+8上午10時19分38秒寫道:
Reply all
Reply to author
Forward
0 new messages