A simple Whoosh integration approach (for searching)

121 views
Skip to first unread message

Bernardo

unread,
Nov 3, 2010, 4:48:17 AM11/3/10
to web2py-users
Hi all,

I would like to share with you the integration process of this
searching code (Whoosh) in my web2py application. For those who don't
know it, Whoosh is a search engine fully written in python, so it is
easy to use in a framework such as web2py, but with some little
issues. Well, after placing the source code of Whoosh in modules
section, and writing the proper imports, we are ready to go using it.

The problem comes when, reading whoosh documentation, we find out that
when indexing a document, whoosh lock the index (it is just locked to
write, you can read in while indexing). So, how to solve this
imposibility of concurrency? (Imagine that two users are performing
actions which require an indexation at the same time... one of them
would return an error).

What I did is to create a queue, and a cron script that looks into
that queue and process it. The queue is a simple table in the
database, with all the fields to be indexed, that stores a record
everytime a user performs an action that needs to index something.

Then, the cron script reads that table, locks the index, indexes the
records in the queue, and frees the index.

I hope this help someone... if you have any doubts, please do not
hesitate in asking.

regards,
Bernardo

Branko Vukelic

unread,
Nov 3, 2010, 5:31:25 AM11/3/10
to web...@googlegroups.com
On Wed, Nov 3, 2010 at 9:48 AM, Bernardo <este...@gmail.com> wrote:
> The problem comes when, reading whoosh documentation, we find out that
> when indexing a document, whoosh lock the index (it is just locked to
> write, you can read in while indexing). So, how to solve this
> imposibility of concurrency? (Imagine that two users are performing
> actions which require an indexation at the same time... one of them
> would return an error).

Wouldn't it be better if woosh was served by a separate application?
If all it does is index things, you could easily have solved the
problem by making an app that would be a search server with an
external API.

--
Branko Vukelić

bg.b...@gmail.com
stu...@brankovukelic.com

Check out my blog: http://www.brankovukelic.com/
Check out my portfolio: http://www.flickr.com/photos/foxbunny/
Registered Linux user #438078 (http://counter.li.org/)
I hang out on identi.ca: http://identi.ca/foxbunny

Gimp Brushmakers Guild
http://bit.ly/gbg-group

Bernardo

unread,
Nov 3, 2010, 6:03:21 AM11/3/10
to web2py-users
Sure, the problem would have been solved that way. But if you want to
deploy your app in an external server or hosting, in my opinion is
easier to have it all integrated in the same application, than having
to run it in a sepparated one.

Thanks for your point of view!!
Bernardo

On 3 nov, 10:31, Branko Vukelic <bg.bra...@gmail.com> wrote:
> On Wed, Nov 3, 2010 at 9:48 AM, Bernardo <estem...@gmail.com> wrote:
> > The problem comes when, reading whoosh documentation, we find out that
> > when indexing a document, whoosh lock the index (it is just locked to
> > write, you can read in while indexing). So, how to solve this
> > imposibility of concurrency? (Imagine that two users are performing
> > actions which require an indexation at the same time... one of them
> > would return an error).
>
> Wouldn't it be better if woosh was served by a separate application?
> If all it does is index things, you could easily have solved the
> problem by making an app that would be a search server with an
> external API.
>
> --
> Branko Vukelić
>
> bg.bra...@gmail.com
Reply all
Reply to author
Forward
0 new messages