Is it possible to query Sphinx with any kind of characters?
I currently have the following indexed:
English: Pick a category
Russian: Выберите категорию
Chinese: 选择分类
ThinkingSphinx.search("Выберите категорию") find and return the
russian entry.
But ThinkingSphinx.search("选择分类") doesn’t find the chinese entry.
Is this a Sphinx/Thinking Sphinx problem or am I doing something
wrong?
Thanks,
Édouard
The problem is that if the string “选择分类” was indexed, searching for
“选” didn’t yield any results.
I fixed this by adding this in sphinx.conf:
ngram_len: 1
ngram_chars: "U+00C6->U+00E6, U+01E2->U+00E6, U+01E3->U+00E6 ...
`ngram_chars` is basically the same than `charset_table`, without the
latin characters. we basically only want NCK characters in there.
Hope this will help,
Édouard
On Nov 19 2009, 1:07 am, Pat Allan <p...@freelancing-gods.com> wrote:
> Ah, great to know you figured it out - I've not had to deal with Chinesecharacters before.
>
> --
> Pat
>
> On 19/11/2009, at 5:20 AM, Édouard Brière wrote:
>
>
>
> > I figured it out!
>
> > My charset_table was not configured in sphinx.yml. It looks like so:
>
> > development:
> > charset_table: "U+00C0->a, U+00C1->a, U+00C2->a, ... U+01DE->a, U
> > +01DF->a, \\\n \
> > U+01E0->a, U+01E1->a, U+01FA->a, U+01FB->a, U+0200->a ... \\\n \
> > ..."
> > production:
> > charset_table: "..."
>
> > There is a full list of charsets available there:http://pastie.org/204316.txt
> > I tested it for my app and it works fine forchinese, russian and
To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.