Hello, I have a mysterious traceback

32 views
Skip to first unread message

Skylar Saveland

unread,
Oct 18, 2009, 12:25:05 PM10/18/09
to django-haystack
Hi all,

I have been getting this from random urls on a site emailed to me as
an ADMIN. I can not make any of the mentioned urls give me a 500
error and all seems well when I browse the site. I unpredictably get
this error:

http://dpaste.com/hold/108950/

The file that it's complaining about does not actually exist. Or at
least, I can't catch it existing. Maybe it's getting created and then
deleted again? Any help with this problem would be greatly
appreciated.

Thanks!
Skylar

Daniel Lindsley

unread,
Oct 18, 2009, 3:23:25 PM10/18/09
to django-...@googlegroups.com
Skylar,


That is a Whoosh specific traceback. What's happening is that the
Whoosh engine in one process/thread is locking the index files while
another spins up and tries to access them. This is a common error with
Whoosh under any kind of load, which is why it's only recommended for
small sites or development.

A way around this is to subclass `SearchIndex` with the following:

=======
from haystack import indexes


class WhooshSearchIndex(indexes.SearchIndex):
def _setup_save(self, model):
pass

def _setup_delete(self, model):
pass

=======

Then subclass your `SearchIndex`es off that. The last setup would
be to setup a cron job that runs `./manage.py reindex` nightly (or
however often you need). What this does is prevent Haystack from
updating the index when a model is saved/deleted, which means your
index won't stay real-time but does let you use Whoosh on a higher
traffic site.

I'll be adding this to the official documentation shortly, as many
people run into this. If you need closer to real-time search, you'll
want to use either Solr or Xapian.


Daniel

Skylar Saveland

unread,
Oct 19, 2009, 10:38:32 PM10/19/09
to django-haystack
Hey, thanks Daniel, I installed xapian and everything seems to be
working fine

Cheers!
Skylar

Skylar Saveland

unread,
Oct 19, 2009, 10:59:02 PM10/19/09
to django-haystack
Actually, Now I just got a similar error with the xapian backend:

http://dpaste.com/hold/109548/

Daniel Lindsley

unread,
Oct 19, 2009, 11:02:12 PM10/19/09
to django-...@googlegroups.com
Skylar,


Did you check permissions/ownership on those files? Also, have you
tried removing the `search_index` directory completely and reindexing?


Daniel

Skylar Saveland

unread,
Oct 19, 2009, 11:37:54 PM10/19/09
to django-...@googlegroups.com
Ownership and permissions should be fine.  The user that runs the django/wsgidaemonprocess can definitely write to the dir.  I can run

  $./manage.py reindex

I deleted everything inside* search_index and reindex'd once although I did not delete the dir itself.  I have gotten one of these tracebacks since then.

James Aylett

unread,
Oct 20, 2009, 6:28:03 AM10/20/09
to django-...@googlegroups.com
On Mon, Oct 19, 2009 at 07:59:02PM -0700, Skylar Saveland wrote:

> Actually, Now I just got a similar error with the xapian backend:
>
> http://dpaste.com/hold/109548/

Quoting the xapian-haystack documentation:

Because Xapian does not support simultaneous WritableDatabase
connections, it is *strongly* recommended that users either set
`WSGIDaemonProcess processes=1` or override the default
SearchIndex class to remove the post-save and post-delete signals
that cause an immediate re-index. Instead, manually re-index your
site content through a cronjob at pre-determined times.

You can also use a queue (many senders, one receiver which is the only
thing needing a write lock on the database) to avoid full re-indexing,
which can become impractical in large systems.

J

--
James Aylett

talktorex.co.uk - xapian.org - uncertaintydivision.org

Skylar Saveland

unread,
Oct 20, 2009, 9:04:19 AM10/20/09
to django-...@googlegroups.com
My WSGIDaemonProcess directive looks like this:

  WSGIDaemonProcess skyl.org user=skyl group=skyl processes=1 threads=4 python-path=/home/skyl/virtualenvs/env/lib/python2.6/site-packages/

Perhaps having other wsgidaemonprocesses that also try to access the same postgres instance (though different dbs) is causing the problem?

David Sauve

unread,
Oct 20, 2009, 9:36:54 AM10/20/09
to django-...@googlegroups.com
Just to re-iterate what was discussed on IRC (in case anyone else is having a similar issue).  It appears as though multi-threading the WSGIDaemonProcess may be causing the same issue with multiple WritableDatabse connections.

In this case, there are two possible solutions:

1) Re-write the WSGIDaemonProcess directive to only make use of one process and/or thread; or (a better solution in my opinion)

2) Derive your search indexes from a custom SearchIndex that overrides the post save and delete signals like this: http://gist.github.com/214254

Hope that helps.

David

Skylar Saveland

unread,
Oct 20, 2009, 9:18:29 PM10/20/09
to django-...@googlegroups.com
On Tue, Oct 20, 2009 at 9:36 AM, David Sauve <dns...@gmail.com> wrote:
Just to re-iterate what was discussed on IRC (in case anyone else is having a similar issue).  It appears as though multi-threading the WSGIDaemonProcess may be causing the same issue with multiple WritableDatabse connections.

In this case, there are two possible solutions:

1) Re-write the WSGIDaemonProcess directive to only make use of one process and/or thread; or (a better solution in my opinion)

2) Derive your search indexes from a custom SearchIndex that overrides the post save and delete signals like this: http://gist.github.com/214254

Hope that helps.

David


detailing today's activities.  It will be outdated in no time (hopefully) but is decent haystack/xapian tutorial for today.

@@

unread,
Oct 20, 2009, 9:40:20 PM10/20/09
to django-...@googlegroups.com
Hi

On Wed, Oct 21, 2009 at 9:18 AM, Skylar Saveland <skylar....@gmail.com> wrote:

detailing today's activities.  It will be outdated in no time (hopefully) but is decent haystack/xapian tutorial for today.

if you got a lot of models need to reindex, it may take a long time to reindex. (In my case 50,000 models will take about minutes to reindex)
I prefer the queue approach as James Aylett said. It would be better if haystack provide such function.

James Aylett

unread,
Oct 21, 2009, 9:56:24 AM10/21/09
to django-...@googlegroups.com
On Wed, Oct 21, 2009 at 09:40:20AM +0800, @@ wrote:

> I prefer the queue approach as James Aylett said. It would be better if
> haystack provide such function.

I believe that the Haystack attitude here is to let you plug into a
queue, since there are so many options. You should be able to hook up
any queue you want by overriding save/delete in the same way as
mentioned for just making them do nothing.

@@

unread,
Oct 21, 2009, 11:21:26 AM10/21/09
to django-...@googlegroups.com
On Wed, Oct 21, 2009 at 9:56 PM, James Aylett <ja...@tartarus.org> wrote:

On Wed, Oct 21, 2009 at 09:40:20AM +0800, @@ wrote:

> I prefer the queue approach as James Aylett said. It would be better if
> haystack provide such function.

I believe that the Haystack attitude here is to let you plug into a
queue, since there are so many options. You should be able to hook up
any queue you want by overriding save/delete in the same way as
mentioned for just making them do nothing.

One thing i like haystack is that it's really easy to use :)
Task queue is kind difficult for me :p, never implemented such a thing (and it have to be reliable, if it got killed it have to be able to be proceeded later).  
And i didn't know if there are already some such task queue modules ( gea got one but i didn't use gae as host ).

Maybe i will just add a flag field to the model, with it i can find models need to reindex.

Skylar Saveland

unread,
Oct 21, 2009, 11:36:57 AM10/21/09
to django-...@googlegroups.com
I think that http://gearman.org/ might be a good way to go?

Daniel Lindsley

unread,
Oct 21, 2009, 11:37:13 AM10/21/09
to django-...@googlegroups.com
You might want to consider looking at the `queues` framework
(http://code.google.com/p/queues/) or at `pika`
(http://github.com/tonyg/pika/). Both are Python wrappers to queuing
libraries that make things simple. `queues` is pluggable and support
many queues (with the exception of RabbitMQ) while `pika` does just
RabbitMQ but is decently licensed.

With `queues`, the code is literally an import, two lines per method +
a little config.

from queues import queues
...
def _setup_save(self, model):
queue = queues.Queue('my-queue-name')
queue.write('update:%s.%s.%d' % (model._meta.app_label,
model._meta.module_name, model._get_pk_val()))


Daniel
Reply all
Reply to author
Forward
0 new messages