Alternate backend with faster indexing and query time and smaller index size

630 views
Skip to first unread message

spitis

unread,
Dec 30, 2016, 10:40:51 AM12/30/16
to Whoosh
Hi all... I made an alternate backend for Whoosh that some of you might find useful. It has limited support (in terms of both whoosh features and python versions), but indexes and queries twice as fast as Whoosh, and also takes advantage of the fast c-level integer compression to create smaller indexes. 

Thomas Koch

unread,
Jan 27, 2017, 11:50:25 AM1/27/17
to Whoosh
Hi,
had a short look at the PyIndex project and really like the performance improvements. However, the limitations currently prevail the performance speedup benefits from my perspective. Would it be possible to apply your ideas to the main branch of whoosh without loosing the full functionality? 

Coming from PyLucene I had a look at Whoosh and was fascinated by its feature richness while being simple to use and pythonic. However after running some benchmarks the performance penalty of pure-python whoosh seemed really big. For small data sets (as the reuters benchmark in whoosh package) it appears if whoosh is 'only' slower by factor 2 (roughly) in terms of indexing and searching (compared to pylucene). with search results in 0.01 seconds (whoosh) compared to 0.004 seconds (pylucene) for example this still seemed tolerable.

However with bigger data sets (as the enron benchmark in whoosh package) and large number of hits (i.e. high limit) the drawback became more obvious: indexing took 3 mins with pylucene - and about an hour with whoosh! Searching with whoosh took up to 3.1 seconds - compared to 0.003 seconds in pylucene. Am I the first one to notice this performance issue or are there any plans to improve whoosh in terms of indexing and searching time?

BTW, I've just created a pull-request which includes the pylucene support (for pyLucene3.6!) and fixes some issues with the benchmarks:

I can share the benchmark results if anyone is interested of course.

regards,
Thomas
--
Reply all
Reply to author
Forward
0 new messages