Spelling suggestion and encoding.

7 views
Skip to first unread message

Javi

unread,
Feb 24, 2011, 10:33:10 AM2/24/11
to xapian_db
I've just found the following behavior in my application:

search = Page.search "ubicación" <-- [#<Xapian::Document:
0xb487258>]
title = search.map(&:title).first <-- "Ubicación"
title.encoding <-- #<Encoding:UTF-8>

Everything fine there. However:

wrong_search = Page.search "ubiacción" <-- []
suggestion = wrong_search.spelling_suggestion <-- "ubicaci
\xC3\xB3n"
suggestion.encoding <--
#<Encoding:ASCII-8BIT>

On a related note, I've tried to provide a test case, but running the
suite fails with:
xapian_db/lib/xapian_db.rb:9:in `require': no such file to load--
xapian (LoadError)

I guess I'll have to install some kind of xapian bindings. Do you know
which gems or system packages I need?

Thanks!

Gernot

unread,
Feb 24, 2011, 11:56:06 AM2/24/11
to xapian_db
Hi Javi

I guess you'll have to install xapian and the ruby bindings for xapian
by hand. See the readme from version 0.5.1 (https://github.com/garaio/
xapian_db/tree/v0.5.1) for instructions.

Cheers
Gernot

Javi

unread,
Feb 24, 2011, 12:43:24 PM2/24/11
to xapi...@googlegroups.com
On Jueves, 24 de Febrero de 2011 17:56:06 Gernot escribió:
> I guess you'll have to install xapian and the ruby bindings for xapian
> by hand. See the readme from version 0.5.1 (https://github.com/garaio/
> xapian_db/tree/v0.5.1) for instructions.

Thanks! It was quite easy because they were already compiled after installing the xapian_db gem.

Here's the patch with the failing test case.

Regards.

encoding_test.diff

Gernot

unread,
Feb 25, 2011, 6:04:15 AM2/25/11
to xapian_db
Cool! I could reproduce it and made a fix. It's now bound to utf-8
encoding but I think that's ok, right?

Cheers
Gernot
>  encoding_test.diff
> 2KAnzeigenHerunterladen

Javi

unread,
Feb 25, 2011, 8:44:12 AM2/25/11
to xapian_db
On Feb 25, 12:04 pm, Gernot <gernot.kog...@gmail.com> wrote:
> It's now bound to utf-8 encoding but I think that's ok, right?

For me it's perfectly OK (I was doing just that in my application),
but to be honest I've no idea what will happen for non-utf
applications.

I don't know anything about the way xapian databases handle character
encoding either.

Thanks for the fix. Upgrading to 0.5.5 right now.

Gernot

unread,
Feb 25, 2011, 9:18:02 AM2/25/11
to xapian_db
I did a little research and found that xapian expects the indexed data
to be utf8 encoded anyways. So my fix should be fine :-)
Reply all
Reply to author
Forward
0 new messages