ore.xapian

kapil

unread,

May 8, 2008, 3:52:14 AM5/8/08

to xappy-discuss

hi folks,

i released the ore.xapian package to pypi a few weeks back, and after
a few iterations i've got in production on few small applications, its
a thin layer on top of xappy to give an indexing framework for zope3
based applications.

its pretty xapian agnostic.. its designed as an async indexing
framework, with abstractions for content indexers, content storage/
resolution, transactional flush into the indexing queue, manages
reopening search connections, etc.

the pypi page goes into a bit more detail (doctest style).
http://pypi.python.org/pypi/ore.xapian

i'm using it succesfully to index content from relational databases
and subversion with a zope3 front end. only real todo is to make the
index queue persistent for remote indexers, but to be useful that
would need corresponding support for remote search connections in
xappy. unfortunately i don't have the bandwidth for the latter atm.

cheers,

kapil

Richard Boulton

unread,

May 8, 2008, 5:40:35 AM5/8/08

to xappy-...@googlegroups.com

kapil wrote:
> hi folks,
>
> i released the ore.xapian package to pypi a few weeks back, and after
> a few iterations i've got in production on few small applications, its
> a thin layer on top of xappy to give an indexing framework for zope3
> based applications.
>
> its pretty xapian agnostic.. its designed as an async indexing
> framework, with abstractions for content indexers, content storage/
> resolution, transactional flush into the indexing queue, manages
> reopening search connections, etc.

That sounds excellent. The only warning I'd like to make is that the
xappy API isn't yet stable, and in particular the FieldAction stuff is
likely to be redesigned shortly (I'll send a summary of my thoughts on
what I want to do with this, and why, to this list in a bit, to ensure I
get the interface right this time). I'm hoping to have the API stable
in a couple of weeks, though.

From a quick peruse of the pypi page, I see that it exports the
FieldAction style interface, so you might want to add a note that this
is liable to change in the near future. More on this when I get a
chance to write up my plans properly.

For reference, I hope to make a 0.6 release of xappy in the next two
weeks, including a reworking of the field action interface. I'll then
work towards the 0.7 release, and at the 0.7 release I expect to declare
the API stable.

See http://code.google.com/p/xappy/issues/list?sort=milestone for
details of the issues currently planned to be addressed for each release.

> i'm using it succesfully to index content from relational databases
> and subversion with a zope3 front end. only real todo is to make the
> index queue persistent for remote indexers, but to be useful that
> would need corresponding support for remote search connections in
> xappy. unfortunately i don't have the bandwidth for the latter atm.

There is one change to xapian which is required to make it possible to
perform remote searches with xappy, which is to implement metadata
support for remote databases. This is not a very hard task, but I don't
have time to do it, right now. The appropriate xapian ticket is
http://trac.xapian.org/ticket/178, and this is marked for 1.1.0; I'll
endeavour to make sure this doesn't slip from the 1.1.0 release of xapian.

--
Richard

Kapil Thangavelu

unread,

May 8, 2008, 7:45:05 PM5/8/08

to xappy-...@googlegroups.com

On Thu, May 8, 2008 at 5:40 AM, Richard Boulton <ric...@lemurconsulting.com> wrote:

kapil wrote:
> hi folks,
>
> i released the ore.xapian package to pypi a few weeks back, and after
> a few iterations i've got in production on few small applications, its
> a thin layer on top of xappy to give an indexing framework for zope3
> based applications.
>
> its pretty xapian agnostic.. its designed as an async indexing
> framework, with abstractions for content indexers, content storage/
> resolution, transactional flush into the indexing queue, manages
> reopening search connections, etc.

That sounds excellent. The only warning I'd like to make is that the
xappy API isn't yet stable, and in particular the FieldAction stuff is
likely to be redesigned shortly (I'll send a summary of my thoughts on
what I want to do with this, and why, to this list in a bit, to ensure I
get the interface right this time). I'm hoping to have the API stable
in a couple of weeks, though.

From a quick peruse of the pypi page, I see that it exports the
FieldAction style interface, so you might want to add a note that this
is liable to change in the near future. More on this when I get a
chance to write up my plans properly.

noted, i'll go back and update the readme, regarding api stability.Looking to forward to reading the actions proposals.

For reference, I hope to make a 0.6 release of xappy in the next two
weeks, including a reworking of the field action interface. I'll then
work towards the 0.7 release, and at the 0.7 release I expect to declare
the API stable.

See http://code.google.com/p/xappy/issues/list?sort=milestone for
details of the issues currently planned to be addressed for each release.

> i'm using it succesfully to index content from relational databases
> and subversion with a zope3 front end. only real todo is to make the
> index queue persistent for remote indexers, but to be useful that
> would need corresponding support for remote search connections in
> xappy. unfortunately i don't have the bandwidth for the latter atm.

There is one change to xapian which is required to make it possible to
perform remote searches with xappy, which is to implement metadata
support for remote databases. This is not a very hard task, but I don't
have time to do it, right now. The appropriate xapian ticket is
http://trac.xapian.org/ticket/178, and this is marked for 1.1.0; I'll
endeavour to make sure this doesn't slip from the 1.1.0 release of xapian.

i looked into it and start reading the replication docs, and i realize now that remote connections aren't really what i want/need for increased throughput. the use of index replication, with a dedicated indexing server, and replication of indexes to the app servers/searchers gives a much better cpu utilization, througput, and redundancy (which collectively is what i'm after) at the understandable cost of network replication overhead, and possibly search cluster consistency issues during updates.