Searches with accents

511 views
Skip to first unread message

Iván Belmonte

unread,
Apr 7, 2009, 8:42:14 AM4/7/09
to Thinking Sphinx
Hi folks

Is it possible to make thinking_sphinx ignore accents on searches? I
mean... make "perdón" and "perdon" match the same?

I've thought about indexing directly with no accents, and then parse
search strings eveytime to remove its accents before the match. The
question is: how can I make TS index without accents?

Thanks in advance!

Iván

James Healy

unread,
Apr 7, 2009, 8:51:45 AM4/7/09
to thinkin...@googlegroups.com
Hi Iván,

The trick is to use a sphinx feature called charset_table. I publised a
blog article on how to use it last year:

http://yob.id.au/blog/2008/05/08/thinking_sphinx_and_unicode/

-- James Healy <jimmy-at-deefa-dot-com> Tue, 07 Apr 2009 22:49:41 +1000

IvanHQ

unread,
Apr 7, 2009, 11:29:09 AM4/7/09
to thinkin...@googlegroups.com
Thanx James!

Well, the problem is that I need to "enable_star" on the model or it gives "ERROR [...] enable_star=0" and indexes nothing.
If I enable star, then it gives no error, but does not index any string containing accents :-\

Maybe I'm doing something wrong... I followed your tuto letter by letter :-\

Iván
--

  ____________________________________________________|
 /                                                              
|  Iván
|  mol...@gmail.com
 \________________________________

Iván Belmonte

unread,
Apr 7, 2009, 12:09:55 PM4/7/09
to Thinking Sphinx
Oh my god, forget it: it didn't index because of the line breaks...
silly of me grmmm
sorry :-\
> |  molo...@gmail.com
>  \________________________________

Iván Belmonte

unread,
Apr 7, 2009, 7:20:18 PM4/7/09
to Thinking Sphinx
Hmmmmm I'm getting errors.

1) If i put "allow_star: true" in sphinx.yml then I get this error:

ERROR: index 'property_core': infixes and morphology are enabled,
enable_star=0

2) If I put "enable_star: true" and "allow_star: true", same error

3) If I put only "enable_star: true" then no errors, everything fine,
except it doesn't index any record containing an accent.


Any idea? Thanks for all =)

Iván

James Healy

unread,
Apr 7, 2009, 8:36:41 PM4/7/09
to thinkin...@googlegroups.com
Iván Belmonte wrote:
> 3) If I put only "enable_star: true" then no errors, everything fine,
> except it doesn't index any record containing an accent.

enable_star is the correct option to use, IIRC allow_star is deprecated.

I use these settings, and the seem to work well:

enable_star: 1
min_prefix_len: 1
min_infix_len: nil

-- James Healy <jimmy-at-deefa-dot-com> Wed, 08 Apr 2009 10:34:55 +1000

IvanHQ

unread,
Apr 7, 2009, 10:09:22 PM4/7/09
to thinkin...@googlegroups.com
Hello again James

Yes, your options work, now my sphinx indexes documents and gives no error. But it doesn't index any document containing accents.
MySQL default charset is latin1, so maybe it is the root of the problem. Is there any way to solve it while using latin1? or should I convert all the database to utf-8?

Thanks again

Iván

Iván Belmonte

unread,
Apr 7, 2009, 10:50:59 PM4/7/09
to Thinking Sphinx
Okay, i have included this option:

charset_type: "sbcs"

And now sphinx indexes al documents. But now I'm like at the
beggining, as if the charset_table didn't take effect.
Still investigating...
> |  molo...@gmail.com
>  \________________________________

James Healy

unread,
Apr 7, 2009, 10:57:43 PM4/7/09
to thinkin...@googlegroups.com
IvanHQ wrote:
> Yes, your options work, now my sphinx indexes documents and gives no error.
> But it doesn't index any document containing accents.
> MySQL default charset is latin1, so maybe it is the root of the problem. Is
> there any way to solve it while using latin1? or should I convert all the
> database to utf-8?

That could be your issue - I know my database is encoded as utf-8.

Converting the encoding of a database is generally non trivial - I'd
suggest testing the theory first, before you spend a long time
converting your own DB.

You might also try explicitly setting the encoding in your sphinx config:

charset_type: utf-8

I think TS defaults to this, but it can't hurt to double check.

-- James Healy <jimmy-at-deefa-dot-com> Wed, 08 Apr 2009 12:46:12 +1000

Iván Belmonte

unread,
Apr 7, 2009, 11:42:11 PM4/7/09
to Thinking Sphinx
Definitely i'm converting my database into utf-8

Will advise you if it explodes heheh


Iván

Iván Belmonte

unread,
Apr 8, 2009, 2:05:54 AM4/8/09
to Thinking Sphinx
No way man, my database is now UTF-8, but sphinx does not index any
record containing an accent.
My charset_table is the one from your post... don't know what more to
do or read...

Iván Belmonte

unread,
Apr 8, 2009, 5:16:30 AM4/8/09
to Thinking Sphinx
Thanx for all James, the problem was my database. Moved it to utf8,
but still had some unicode strange chars... now my database is clean
and working on utf8. Your charset_table works like a charm, and you
guy rule a lot =D

Thanx!!

Iván
Reply all
Reply to author
Forward
0 new messages