round quote and diacriticals

6 views
Skip to first unread message

jcm

unread,
May 1, 2012, 8:05:33 PM5/1/12
to xapian_db
Hi,


I'm using xapian_db for a french language site. Api is perfect, thks
for the job.
My problem is related with diacriticals : when "éléphant" is indexed
in a text, "elephant" cannot find it.
Second point, the round quote ’ seems to be considered as a word part.
I'd like to separate words around it as it's the use in French.
For instance "l’oncle d’Alphonse" should be found with "oncle" or
"Alphonse".
Could you point me to the settings to achieve this ? or what should be
contributed if it's not possible now.

Gernot

unread,
May 2, 2012, 5:01:22 AM5/2/12
to xapian_db
Hi

Nice to hear you're enjoying xapian_db. To your questions: Have you
tried to set the language to fr as described in the README?

jcm

unread,
May 2, 2012, 7:49:14 AM5/2/12
to xapian_db
Hi,

On 2 mai, 11:01, Gernot <gernot.kog...@gmail.com> wrote:
> > Nice to hear you're enjoying xapian_db. To your questions: Have you
> tried to set the language to fr as described in the README?

Sure, I have a xapian_db.yml in config/, with

defaults: &defaults
adapter: active_record
language: fr

Gernot

unread,
May 2, 2012, 12:14:38 PM5/2/12
to xapian_db
Hmm, that means that the french stemmer does not work like expected.
Have a look at XapianDb::Indexer#index_text. As you can see, I'm using
the Xapian TermGenerator to generate index terms from a model. I'm
afraid I cannot help here, since the problem lies in the xapian
binaries, not xapian_db. See also http://comments.gmane.org/gmane.comp.search.xapian.general/7208

If you find a workaround, I will gladly accept a pull request.

Gernot

jcm

unread,
May 2, 2012, 1:02:35 PM5/2/12
to xapian_db
Hi,


On 2 mai, 18:14, Gernot <gernot.kog...@gmail.com> wrote:
> Hmm, that means that the french stemmer does not work like expected.
> Have a look at XapianDb::Indexer#index_text. As you can see, I'm using
> the Xapian TermGenerator to generate index terms from a model. I'm
> afraid I cannot help here, since the problem lies in the xapian
> binaries, not xapian_db. See alsohttp://comments.gmane.org/gmane.comp.search.xapian.general/7208

Thks. Is there a way to test indexer or stemmer from console or irb ?
Can I get the query once parsed ? I prepend 'lang:fr' to all my
queries, and will later have 'lang:en' for the future english pages.
When I search with a stop word, I get all the pages as results, since
it searches with 'lang:fr'... Could be avoided if I can detect all
words were removed from query as stop words.

Gernot

unread,
May 4, 2012, 1:09:55 AM5/4/12
to xapian_db
If you use the rails console, everything is already setup for you.
Here, you can play with the xapian api. For a reference, here are the
xapian ruby bindings: http://xapian.org/docs/bindings/ruby/rdocs/

To access your database, use XapianDb.database.
Reply all
Reply to author
Forward
0 new messages