Does ZODB has any full-text search functions?

123 views
Skip to first unread message

Etienne Robillard

unread,
Dec 16, 2017, 6:58:19 AM12/16/17
to zodb
Hi,

I would like to implement full-text search for my ZODB-powered blog. Is
there any native support for full-text search in ZODB ?

If not, what would be required to implement this into ZODB ?

Thank you in advance,

Etienne

--
Etienne Robillard
tka...@yandex.com
https://www.isotopesoftware.ca/

Etienne Robillard

unread,
Dec 16, 2017, 7:39:21 AM12/16/17
to zodb
What do you think about the idea of using Whoosh
(http://whoosh.readthedocs.io/en/latest/intro.html) to query ZODB
objects and create index documents?

Since, ZODB is a pure Python database, it would be quite easy to access
a ZODB database using Whoosh.

What do you think?

Etienne

Jim Fulton

unread,
Dec 16, 2017, 11:23:47 AM12/16/17
to Etienne Robillard, zodb
On Sat, Dec 16, 2017 at 4:58 AM, Etienne Robillard <tka...@yandex.com> wrote:
Hi,

I would like to implement full-text search for my ZODB-powered blog. Is there any native support for full-text search in ZODB ?

No and yes. :)  

Traditionally, we've used application level "catalogs" that provide powerful search capabilities, including full-text search.  For better or worse, these have often been tied closely to application frameworks. Doing so made their management easier, and in particular their update more automatic, much the way much the way that RDBMS indexes are transparent to applications.  Catalogs are very widely used in Zope and similar applications.  Due to ZODB's powerful caching, they're sometimes faster than using external indexes.

I would love for there to be a recommended catalog for ZODB. When I brought this up a while ago, people pointed out hypatia: https://github.com/Pylons/hypatia/blob/master/docs/genealogy.rst, but that project hasn't seen commits in way over a year.

Many applications find ways to leverage external indexes. For example, Newt DB make it easy to index ZODB data with Postgres.  I would also love to provide ways to make this easier, by leveraging replication patterns.
 
If not, what would be required to implement this into ZODB ?

I think the biggest question is what "this" is.  A central issue is how indexes get updated. 

I think that when people use a database, they expect indexing to be transparent to application code.  As mentioned above, this has traditionally caused catalog implementations to be tightly integrated with application frameworks. 

It would be interesting, IMO, to find a way to make indexing transparent without tieing it to an application framework, presumably with some hooks in ZODB.  Newt achieves this by post-processing data during the commit process.  I could image a similar approach at the Python level in ZODB itself, but there are other ways.

Another interesting wrinkle in this is handling of hierarchies.  Catalogs let you index functions, which can do pretty much anything, but especially let you walk hierarchies to get values to index.  Postgres is far more restrictive, and makes indexing hierarchically-derived data very difficult.  

In catalogs, application event frameworks are used to trigger indexing.  This causes indexing to happen mid-transaction.  This is more or less, how relational indexes work, except that in an RDBMS, the events are internal to the database. Perhaps that's how a ZODB indexing strategy should work as well, at least for real-time indexing.

Yet another issue is that indexing in real time can be very expensive. It's much more efficient to update indexed in batches.  Many applications don't want to make user interfaces wait for indexing. For this reason it's a common pattern to index data asynchronously.  For catalogs, this is often done with catalog queues.  It may also be done by batching updates of external indexes.

Whether you want real-time indexing or asynchronous indexing depends on how you use indexes in your application. Some applications use a mix. 

Complicated enough yet? :)

Jim

--

Jim Fulton

unread,
Dec 16, 2017, 11:28:27 AM12/16/17
to Etienne Robillard, zodb
On Sat, Dec 16, 2017 at 5:39 AM, Etienne Robillard <tka...@yandex.com> wrote:
What do you think about the idea of using Whoosh (http://whoosh.readthedocs.io/en/latest/intro.html) to query ZODB objects and create index documents?

I'm not familiar with it, and don't want to spend the time to give a knowledgeable answer, but ...
 
Since, ZODB is a pure Python database, it would be quite easy to access a ZODB database using Whoosh.

What do you think?

I suspect that catalogs do much the same thing.  But perhaps Whoosh is better in some way. IDK.  You should learn a bit about catalogs before deciding to pursue integrating Whoosh. Catalogs provide pluggable APIs for integrating various kinds of indexes. It might be easiest to integrate Whoosh as a catalog index.

By far the hardest issue in all of this, IMO, is the indexing strategy.

Jim

Reply all
Reply to author
Forward
0 new messages