Indexing not working with multiple workers

58 views
Skip to first unread message

Purbasha

unread,
Jun 19, 2017, 10:45:40 PM6/19/17
to django-haystack
Hi,

Need help in understanding how rebuild_index works with multiple workers. 
I am using django 1.8.4, Whoosh 2.7.4  and django-haystack  2.6.0 to build out a search functionality on a database of 5 million records. My environment is Ubuntu and MacOSx.

When I use the multiple workers option, I am not getting the total 5M records in the index.  I have tested this with a smaller subset of 1200 records and found that I can only get all 1200 records into the index when I have one worker. I have tried with several different batch sizes and different number of workers and it is always the case where only a subset of records get indexed.

Is this a known problem? I saw some issues reported on this topic in the Github repository but not sure if they have been resolved or not. When I run with multiple workers, the logs look fine and there are no errors around files getting locked or file not accessible which is something I would expect if multiple workers are trying to write into the file. I have allocated 150GB of space to the volume where indexed data is being stored and my server has 64 GB memory. So I am sure that this not due to lack of storage or lack of memory.

I would really like to use the multiple   workers option to cut down the indexing time to a few hours instead of 12-14 hours.

Thank you,
Purbasha




Subhranath Chunder

unread,
Aug 23, 2017, 12:12:07 AM8/23/17
to django-...@googlegroups.com
You try increasing the verbosity level. Setting the verbosity level to 3 might help you to debug for the exact cause why only one subset is getting indexed.

Also try checking the `index_queryset` method for your SearchIndex extending classes.

--
You received this message because you are subscribed to the Google Groups "django-haystack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-haystack+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Subhranath Chunder.
Reply all
Reply to author
Forward
0 new messages