my database appears the error code

14 views
Skip to first unread message

Belinda

unread,
Mar 19, 2012, 7:39:44 AM3/19/12
to DataparkSearch
Hi !
My charset is Big5, so I have setup the LocalCharset in the
indexer.conf and search.htm files .

The indexer.conf as following :
LocalCharset Big5
RemoteCharset Big5
LangMapFile langmap/zh.big5.lm
LoadChineseList Big5 TraditionalChinese.freq

The search.htm as following :
LocalCharset Big5
BrowserCharset Big5
LoadChineseList Big5 TraditionalChinese.freq

When the charset is Big5, the some contents which is in the urlinfo
table are sometimes garbled.

Is there anything wrong?

Thanks a lot !

Maxim Zakharov

unread,
Mar 19, 2012, 5:45:39 PM3/19/12
to datapar...@googlegroups.com
Hi,

What SQL server do you use?
Which character set is default for your database?

For PgSQL use the following command to see default charset for databases:
pgsql -l

For MySQL execute the following commands:

USE your_database;
show variables like "character_set_database";
show variables like "collation_database";

Maxim

--
http://www.dataparksearch.org/ - an open source search engine.

wang belinda

unread,
Mar 20, 2012, 8:58:59 AM3/20/12
to datapar...@googlegroups.com
Hi, Max
Thank you for your response !

My SQL server is MySQL.

I type the following command in MySQL:

show variables like "character_set_database";
show variables like "collation_database";

And the result is
Variable_name Value
collation_database big5_chinese_ci

I can show you the urlinfo table in my database:
------------------------------------------------------------------------------------------------------------------------------------------
5 body
5 Charset Big5
5 Content-Language zh
5 Content-Type text/html
5 title ??戊瘣餃??勗?蝟餌絞
6 body 嘉女行政單位聯絡網 嘉女電話總機號碼:(05)2254605、(05)2254603、(05)225...
6 Charset Big5
6 Content-Language zh
6 Content-Type text/html
6 title 國立嘉義女子高級中學(嘉義女中)行政單位聯絡網
7 body 嚜 甇斤雯??蝙?冽???雿???函???汗?其蒂銝????
7 Charset Big5
7 Content-Language zh
7 Content-Type text/html
7 title ??戊--撣恍?????
--------------------------------------------------------------------------------------------------------------------------------------------------

The 5(fifth) url's body and title are garbled,but the 6(sixth) url's body and title are correct.
The Chinese words in the dict table are also correct.

In my observation,I discover that when the charset of web is utf8 then my data in the urlinfo are garbled
whereas when the charset of web is big5 then my data in the urlinfo have no plroblem.

I don't know where's wrong.

Thank you very much!


2012/3/20 Maxim Zakharov <dp.m...@gmail.com>

Maxim Zakharov

unread,
Mar 20, 2012, 9:38:49 AM3/20/12
to datapar...@googlegroups.com
Hi,

Which language maps are included in your langmap.conf file, do you have UTF-8 maps for chinese languare in it in particular ?
Do langmap.conf file is included in your indexer.conf file by include command ?

NB: if you change language maps in your langmap.conf file you need to reindex all URL affected to get urlinfo table updated.

Maxim

2012/3/20 wang belinda <mwwa...@gmail.com>
Reply all
Reply to author
Forward
0 new messages